Data schemas are extremely high leverage in a world of LLMs.

2024-10-14 · Bits and Bobs 10/14/24

LLMs given a rough schema for what data to keep track of in the application can do a great job generating code with only a small bit of english language prompting.

You can give the schema in any number of formats.

I find a simple Typescript type definition is the easiest.

This was one of the insights that emerged for me working on Code Sprouts last year–with just a little attention to schema, amazing functionality emerged, almost automatically.

The schema defines the domain of what kinds of things the software will be able to model in the data and thus accomplish.

Once you have the schema the code is often quite simple.

Thinking in schemas is very natural for people with engineering experience.

It doesn't feel like the main task in engineering today because there's often a lot of code you have to write, at great expense.

But when writing code becomes easy and cheap and evaporates away, what is left is the centrality of the schema.

Thinking in schemas is extremely unnatural for people without engineering experience.

It's an abstract task of generalizing.

A tool that allows users to express a schema and amazing things sprout out of it will make it easier for people to become LLM wizards.

And yet requiring the first step to be a schema will set a low ceiling on the number of users who can use it.

Luckily LLMs are pretty good at extracting a schema, too, if directed to do it.

LLMs are great at extracting a schema from a series of example bits of data.

A UX for users to collect bits of data they want to operate on, and then software sprouts out.

The first step in the LLM generation is to extract a schema automatically.

Another pattern: you can simply ask a user to define a few things they want to do, and the LLM can rough in a schema.

If the schema isn't right, it can be easily modified and extended by the LLM for additional use cases.

One of the reasons defining a schema is hard is because you have to think forward to the types of use cases you'll want to add in the future.

But when software is cheap, you can simply modify the schema when you want to add functionality that requires it.

Changing schemas used to be hard because you had to update all of the software that relied on it.

But if software is smaller and bespoke, the overhead is much less–the complexity of schema migration goes up with the square of the number of use cases.

And if you need to update the simple bit of software, simply pass it to an LLM and say "patch yourself" and it does.

Just-in-Time software with a Just-in-Time schema at its core.

Data schemas are extremely high leverage in a world of LLMs.

More on this topic