LLMs are a general-purpose data solvent.

· Bits and Bobs 3/3/25
  • LLMs are a general-purpose data solvent.
    • To extract structured data from unstructured input is extraordinarily expensive to do mechanistically.
    • Each scraper is specialized and very finely tuned to the input.
    • If the input changes shape even a little bit, the scraper breaks.
    • Data extraction is thus finicky, fragile, frustrating, expensive.
    • The only way to do it before was to have such a big audience that even paying an army of operators to create and maintain scrapers was worth it.
    • But now LLMs allow general extraction in a flexible, fluid way[vx][vy].

More on this topic

From other episodes