LLMs are a general-purpose data solvent.
- LLMs are a general-purpose data solvent.
- To extract structured data from unstructured input is extraordinarily expensive to do mechanistically.
- Each scraper is specialized and very finely tuned to the input.
- If the input changes shape even a little bit, the scraper breaks.
- Data extraction is thus finicky, fragile, frustrating, expensive.
- The only way to do it before was to have such a big audience that even paying an army of operators to create and maintain scrapers was worth it.