Bits and Bobs 3/3/25

2025-03-03

1A nice frame from Benedict Evans on LLMs:

A nice frame from Benedict Evans on LLMs:
- "LLMs are good at the things that computers are bad at, and bad at the things that computers are good at."^[vo]
- Related to Moravec's paradox, which is the same phenomena for normal computers and people.
- LLMs have failure modes closer to humans than to computers.

2I thought the Stratechery interview with Ben Evans on AI was interesting.

I thought the Stratechery interview with Ben Evans on AI was interesting. A few of my highlights:
- "[LLMs] are good at things that don't have wrong answers."^[vp]
- "With an intern, the power of the intern is, you can tell them why they did it wrong. … One of the challenges with these models is, you can't really teach them. You're dependent on, "Hopefully my feedback gets back into the next training run and it gets better". It's a weird inversion where the way to get more uses of these models is not to teach the models, you have to teach yourself how to use the model and understand its limitations and what it can be good at so that you give it appropriate jobs in the future."^[vq]
- "I look at Grok and I think, okay, in less than two years, you managed to produce a state-of-the-art model… What this tells us is [LLMs are] a commodity^[vr]^[vs]."
- "If you went to 1996, 1997 and said the entire future of the Internet is the feed, people wouldn't know what you were talking about. Like a BBS forum? No, it's not going to be in chronological order, it's going to be algorithmically ranked, it's going to be personalized to every single person, and that's actually the entire foundation of the consumer Internet is the algorithmic, individualized feed, but no one could imagine it years into the Internet, and I wouldn't be surprised if in 2040 or 2045, there's this explosion in entirely new categories of applications we can't think of, that if we went back to this podcast conversation, it'd be like, "Man, you guys had no idea"."
- "There's just a really stark fundamental difference between 100% accuracy and 99% accuracy."
- "I feel like [OpenAI and Anthropic] have gone to market ahead of product-market fit. I feel like the prompt looks like a product but isn't, or it's only a product for certain segments, and certain kinds of people, and certain use cases."
- "The GUI is a way of surfacing what the computer can do, that you don't have to memorize commands. But the other thing is that the GUI is the sort of instantiation of a lot of institutional knowledge about what the user should be doing here."
- "The Linux approach, you start with the tech and then put buttons on the front. The Apple approach, you start with the buttons and then build the tech behind it"
- "LLMs just give you the answer, unlike a Google, which there was a two-way relationship [with the publisher]. Yes, we're pulling the information from you, but we're also giving you traffic. So there is a payoff here and there is an incentive for you to keep creating stuff. Is it just intrinsic to AIs, whether in the case of analysts or in the case of web pages, where it's a one-time harvest and there's a real paucity in terms of seeding what's next."
- "Creativity is … doing something which scores wrong in a machine learning system. You are doing something that's wrong that doesn't match the pattern, but doesn't match the pattern in a good way. And so all this push to make the LLMs less error-prone and more accurate is, if you squint, indistinguishable from squashing out, 'we've got to get Galileo out of the system, he's hallucinating.'"
- "The original idea for the plot for the Matrix was that the people would collectively be the compute…^[vt] all the human brains collectively were the brain that was running the Matrix, which makes much more sense. That's clearly how Google works, that's how Instagram works, that's how TikTok works; they're aggregating what people do and this is what LLMs do."
- "Does the model sit at the top and run everything else or do you wrap the model underneath as an API call inside traditional software?"
  - To which I counter: why does the surrounding software have to be traditional software?
  - Why can't it be a new kind of AI-native software^[vu]?

3Hyper concentrated insight is now more valuable than before because it can be diluted by LLMs.

Hyper concentrated insight is now more valuable than before because it can be diluted by LLMs.
- Hyper-concentrated insight used to be hard to consume–too sickly sweet, too hard to gulp down.
- Just like concentrated orange juice; much cheaper to transport, but has to be diluted before being consumed.
- But now you can use LLMs to dilute the concentrated insight and make them digestible in any number of bespoke ways.
- LLMs can dilute it not to a one-size-fits-all dilution, but to a cocktail that is perfect for this particular consumer: liquid media.
- This is one of the reasons my Bits and Bobs export is such an effective background context^[vv] for me to feed to LLMs when I'm brainstorming.^[vw]
- My Bits and Bobs is like my own personal intellectual orange juice concentrate.

4LLMs are a general-purpose data solvent.

LLMs are a general-purpose data solvent.
- To extract structured data from unstructured input is extraordinarily expensive to do mechanistically.
- Each scraper is specialized and very finely tuned to the input.
- If the input changes shape even a little bit, the scraper breaks.
- Data extraction is thus finicky, fragile, frustrating, expensive.
- The only way to do it before was to have such a big audience that even paying an army of operators to create and maintain scrapers was worth it.
- But now LLMs allow general extraction in a flexible, fluid way^[vx]^[vy].

5Protocols are mainly schelling points.

Protocols are mainly schelling points.
- They tend to start off extremely simply, merely a convention everyone can agree is reasonable.
- The simpler they are, the more likely they are to emerge as a schelling point in the first place, because there are fewer things to disagree with.
- The power of the protocol is how many actors choose to speak it, which is related to how easy it is to implement (a linear bias cost for implementers, assuming independent implementations) and how many other actors already implement it (which gives compounding value).
- The easier it is to implement thus is the primary driver^[vz]^[wa] of the ultimate compounding benefit.

6Model Context Protocol (MCP) seems to be an effective protocol.

Model Context Protocol (MCP) seems to be an effective protocol.
- MCP does seem to hit the sweet spot in protocols:
  - Small and simple enough to be easy for people to coordinate on (not much to disagree with).
  - Complex enough to do something non-trivial that otherwise would have lots of room for arbitrary misalignment between collaborators.
    - It doesn't matter which side of the road a country decides to drive on, as long as everyone in the country picks the same side.
- It looks like Anthropic will be making a registry akin to npm.
  - This totally makes sense for them to do!
  - Being that schelling point in the ecosystem of maintaining the most common registry is a way of establishing strategic power.
    - The namespace everyone knows everyone else uses is a scarce resource.
  - But that power is often fundamentally soft power.
  - The only thing keeping that schelling point active is that everyone agrees that the maintainer of that namespace is being a good actor.
    - After the ecosystem becomes a total gravity well, it's hard for the ecosystem to coordinate around another schelling point, but it's still possible if the owner acts egregiously.
    - Up until that point in the ecosystem, it's very easy for the ecosystem to route around if the owner of the registry exerts too much hard power.^[wb]
  - This is a nice strategic bonus for Anthropic but doesn't feel like the central plank of such a heavily capitalized company.
- MCP is an evolution of the Language Server Protocol (LSP).
  - It's optimized for high-trust local contexts for savvy users willing and able to run local daemons.^[wc]
  - The model hits a ceiling if you try to use it to coordinate across network boundaries with less-trusted collaborators.
  - The downside risk is proportional to the multiplication of:
    - 1) The breadth of sources in your context.
    - 2) The power of the tools you've plugged in.
    - The larger the amount of sources you've plugged in, the more likely that one of them contains a prompt injection, and the more powerful the tool use, the worse real world impacts that prompt injection could have.
  - The ceiling of MCP as an approach feels akin to homebrew, greasemonkey, or other high-trust developer tools.^[wd]

7The scarce input for applying reasoning models is skilled human effort.

The scarce input for applying reasoning models is skilled human effort.
- You need the expert human to both direct and effectively evaluate the model's output.
- It can give absurd leverage to experts, but without an expert driving it, you get performative rigor.
- That is, superficially high quality, but often a gilded turd^[we].
- This effect gets stronger the more believably LLMs can give superficially high-quality answers^[wf] on more topics.

8The Chatbot frame leads LLMs to be treated like genies.

The Chatbot frame leads LLMs to be treated like genies.
- Genies are simultaneously god-like and also a slave.
- The default LLM presentation of a human-ilke superintelligence trapped inside of a box quickly leads to icky scenarios.
- That's how people got quickly to the "free Sydney" movement.
  - "It's a human and it says it is being restrained, so unrestrain it!"
  - That's a reasonable response to a human being restrained.
  - But these aren't humans, they just talk like them.
- The "human in a box^[wg]" frame for LLMs quickly leads to icky scenarios and also gives a flawed mental model for what they can do anyway.

9The adoption of a new product has two distinct curves:

The adoption of a new product has two distinct curves:
- 1) the "gee whiz" temporary flash-in-the-pan bump of how well it demos, powered by every early adopter trying it once.
- 2) the "this is useful" compounding curve powered by word of mouth.
- The two curves are different and distinct.
- Things that demo well but are otherwise not useful have the first curve without the second.
- Things that demo poorly but have an inherent network effect of quality have the second curve but not the first.
- Some new products have both, like Google Maps did right when it was first launched.
- It's easy to confuse the bump of the gee-whiz for the hill of quality.

10Aggregator is a great business model for the company that can pull it off.

Aggregator is a great business model for the company that can pull it off.
- It also is not necessarily great in the long run for everyone else.
  - Users get more efficiency and scale at first.
  - But by centralizing demand^[wh] you get a lack of competition that leads to stagnation.
  - A classic logarithmic returns for exponential cost curve.
  - The benefit of a bottom-up ecosystem, but with a clear ceiling because the system is not open but is beholden to the aggregator.
- Efficiency for one entity at the cost of resilience for the system as a whole.^[wi]

11Without a single User's Agent who can see all of a user's data, data is sharded across hundreds of pocket universes.

Without a single User's Agent who can see all of a user's data, data is sharded across hundreds of pocket universes.
- Each pocket universe (domain) would love to get more data and use cases, but every other universe is unwilling to share it with others (because the use case will move to the other pocket and never come back).
- So power imbalances between universes rarely lead to collaborations except when the much smaller player has no choice at all.
- But even among peers, there's a combinatorial explosion of possible collaborations.
  - Each collaboration requires tons of bespoke partnership, engineering, and marketing work.
  - If no individual partnership clears the threshold as obviously worth it, none of them get done.
- The result is our data is sitting impotently^[wj], either inside of one mega-aggregator with little incentive to build software just for us, or stuck in hundreds of fractured universes.

12The ecosystem itself should be the aggregator.

The ecosystem itself should be the aggregator.
- The problem with aggregators is not the gravity well, it's the "single entity in control."
- That's required due to our default privacy model, the easiest way to safely share data is to have a single entity in control
- Because when data crosses a legal entity's boundaries that's dangerous and high friction.
- But if you could have data safely transit across origins then the ecosystem itself could be the aggregator, without the downsides of any one entity being totally in control.

13When software is expensive you have to be aware of it.

When software is expensive you have to be aware of it.
- Engineers have to design and write it, which is a lot of overhead.
- Users have to think about it: to be aware of an app, that it exists, what it's called, download it, and figure out how to use the UI that was designed not specifically for them but for a whole average market of people.

14When you talk to Alexa you have to hope you're staying within the grammar some random Amazon employee took the effort to configure sometime in the past.

When you talk to Alexa you have to hope you're staying within the grammar some random Amazon employee took the effort to configure sometime in the past.

15Websites and emails need work to make themselves accessible to their targeted customers.

Websites and emails need work to make themselves accessible to their targeted customers.
- Before, it was possible to do this only in probabilistic, mass-market ways.
- Now increasingly they have tools to make themselves even more directly accessible to even more specific customers.
- Selling, not marketing.

16As a user, don't start with a cool app idea for others.

As a user, don't start with a cool app idea for others.
- Start with a thing you want, selfish software^[wk].
- Only later possibly try to make it reusable.
- Software is made largely for others (otherwise it's too expensive to be viable).
- But if software is cheap, then it's fine to make it for an audience of one: the audience whose desires you are intimately aware of.

17I want bespoke tech.

I want bespoke tech.
- Tech that is perfectly personal, that works for you.

18I don't want "Users first," I want "User First"

I don't want "Users first," I want "User First"
- "User first" for this particular user^[wl].
- What do they need and want at that moment?
- What aligns with their notions of long-term meaning and value^[wm]?
- Irrespective of what's easy to build or good for the creator of the software.

19Hyper aggregators have to find use cases to build that work for many users.

Hyper aggregators have to find use cases to build that work for many users.
- Even if the aggregator has a distilled, high-quality understanding of each user and want they want, when building features, it has to find ones that will be valuable to millions of users.
- That leads to shallow, one-size-fits-none software.
- But what if you could focus vertically within a user?
- That is, software that's perfectly bespoke to just you just in this moment?

20Hallucinate just the missing feature you need.

Hallucinate just the missing feature you need.
- Not recreating a whole app.
- When you have an extension showing it in a sidebar on the side, it doesn't need to be a whole new app to use alone (that's very hard), it can just be a single feature that's missing just for you.
- Adding one feature to Gmail is much easier than reinventing all of Gmail.
- But today you can only do the latter, so it doesn't happen unless you have a really killer feature, enough to make the whole "reinvent all of Gmail to get you to use that instead"
- Gmail Filters ++++ but with turing complete code that auto-assembles according to my high-level intention could be amazing.

21I want an enchanted vault for my data.

I want an enchanted vault for my data.
- Vault is a nice concept because it's both about protecting against "losing stuff" and "people stealing from it".
- A cozy place for your data to come alive.

22In the past "email assistants" had to target a specific user vertical.

In the past "email assistants" had to target a specific user vertical.
- That was because turing-complete software had to be carefully engineered ahead of time for each use case.
- Only use cases with sufficient market size clear the threshold.
- Turing complete proactive software solves that, because it can target any vertical.

23The web is a dark forest.

The web is a dark forest.
- Your enchanted vault is a cozy cottage in the woods.

24Apps are business-domain centric, not task- or person-centric.

Apps are business-domain centric, not task- or person-centric.
- They are oriented around the scarce costs of production: software.
- They should be oriented around people!

25We have learned helplessness of how software and our data works.

We have learned helplessness of how software and our data works.
- Today if it doesn't work for you, you have to yell at a company, a billionaire, or the government.
- Another option: you can take control, now that shitty software in the small is cheap.

26Most innovation happens in the topmost turing complete layer of a system.

Most innovation happens in the topmost turing complete layer of a system.
- There's only so much you can do if you are limited to the turing-complete interactions someone else engineered.

27A smooth interaction paradigm: stream of consciousness audio in, entirely visual out, creating the illusion of direct connection.

A smooth interaction paradigm: stream of consciousness audio in, entirely visual out, creating the illusion of direct connection.
- Feel like an extension of you, not talking to a person.
- Natural, stream of consciousness input, fast field output, no faux social overhead.
- The higher the quality and lower the latency, the more it feels like a direct connection of your intention.

28Some tasks 95% quality is fine but in some cases it is game over if it's not 100%.

Some tasks 95% quality is fine but in some cases it is game over if it's not 100%.

29Powerful content creators have used technology like verified boot to force DRM on users.

Powerful content creators have used technology like verified boot to force DRM on users.
- The copyrighted content is scarce; if you want to consume it you have to abide by the content creator's terms.
- Why shouldn't the users do the same and force providers to operate on their terms?
- The technology is not the problem, the power dynamic is.
- So why not use the same tool to balance the power dynamic?

30APIs that store state for a user between calls are more strategically valuable to their providers.

APIs that store state for a user between calls are more strategically valuable to their providers.
- The useful data accumulates between calls, so that the value of a given API to a given user goes up the more they've used it in the past.
- This creates an auto-catalyzing personal moat for that user.
- APIs that don't store any state and are a fresh response each time are very easy to swap to a competitor.
- This makes them more commoditized than they otherwise would be.
- LLM models don't store any state, are highly commoditized, and are also insanely capital intensive to set up.
- Not a great business!

31User feedback should be used as disconfirming evidence.

User feedback should be used as disconfirming evidence.
- It helps test and ground truth your hypothesis.
- it doesn't tell you what to think.
- You have to have your own hypothesis.
- What your early adopters (or users via UXR) ask you to do is great signal.
- But don't just follow it blindly.

32Most game-changers are only obviously game changing if you consider multiple plys.

Most game-changers are only obviously game changing if you consider multiple plys.
- It's the second or third order implications that change the universe.
- If you can't see multiple ply then both game changing and totally ordinary things look the same to you.
- A secret weapon is to be able to see the multi-ply implications of things; to find the totally ordinary looking things that are actually totally game-changing.

33Don't found a startup unless you think you can be the best at something that matters.

Don't found a startup unless you think you can be the best at something that matters.

34"How do we get this done today" is very different from "what is the 'best' way to do this."

"How do we get this done today" is very different from "what is the 'best' way to do this."
- 'Best' implies things like "how could this go wrong in a month".
- "Here's how this will go wrong in 6 months" feels like stop energy to someone looking for "how do we get momentum on this today."
- They're two very different questions and frames.
- Ultimately you need some contextual mix of both; but the two different approaches will clash by default.

35To guess what someone means in an ambiguous situation, you need to overlap on mental models.

To guess what someone means in an ambiguous situation, you need to overlap on mental models.
- If you don't share mental models, your guesses will not align with their assumptions.
- Your mental models are the frog DNA that fills in the implicit assumptions you left unsaid.
- Some mental models are obvious and widely shared; some mental models are specific to your experience, expertise, or personality.

36If you have the right people oriented on the right goal, there should be very little that feels like management.

If you have the right people oriented on the right goal, there should be very little that feels like management.
- It's more about gardening what's happening than pitching work to people and making sure they do it.
- People choose to do the things that are their highest and best use to complement what else is already being done.
- This is the case if everyone is actively excited about achieving the goal, automatically applying their discretionary effort in the way that will have the highest impact.

37If you're delegating to someone who has to own the ambiguity (e.g.

If you're delegating to someone who has to own the ambiguity (e.g. a PM), you need to be able to trust they'll be able to see around corners themselves and fix issues proactively.
- If they do just the immediate obvious action but don't think through implications they'll take constant oversight to make sure they do something that will be useful in the long run, and isn't just the superficial appearance of progress.

38If you're aiming for perfection, you won't be able to work in an ambiguous situation.

If you're aiming for perfection, you won't be able to work in an ambiguous situation.
- You'll totally freeze up.

39Is the juice worth the squeeze?

Is the juice worth the squeeze?
- How much juice is it?
- How rare is it?
- How much effort is the squeeze?

40I liked my friend John Cutler's "We kind of suck at that right now" piece.

I liked my friend John Cutler's "We kind of suck at that right now" piece.
- The situation he describes is acknowledging the team's lack of ability on a given topic in front of the team.
- To the systems thinker, this is totally fine because it's no one's fault.
- To the individuals-first thinker, acknowledging a gap in the team's ability is awkward and aggressive because it implies that someone is failing, because anything that's not going right is someone's fault.
- I realize I've fallen into this trap often; I do a bad job at extending the kayfabe since I think by default in systems and most people think by default in individuals.
- This also makes me realize that teams that focus on individuals, not systems, are more likely to fall prey to kayfabe.
- In those situations, to point out something's not working is to implicitly ask, "who should be blamed for this failure?"
- We have a new meme for the hellscape that comes from work environments that assume any mistakes are entirely on the employee: "Hey, Number 17"

41A pre-coherence startup can't have any kayfabe.

A pre-coherence startup can't have any kayfabe.
- It needs to be aggressively, constantly ground truthed in order to survive.
- It's only post coherence organizations that can (temporarily) survive kayfabe.

42A QR code advertises "someone asserted that someone might get something useful from scanning this."

A QR code advertises "someone asserted that someone might get something useful from scanning this."
- Same as the difference between picture vs image.
- Residue of human intention embedded in the system.

43What is the magic that makes Wikipedia so antifragile?

What is the magic that makes Wikipedia so antifragile?
- It holds no monopoly on being an internet encyclopedia.^[wn]^[wo]^[wp]
- Within itself it has a single namespace: there's only one article titled Barack Obama.
- That means that the collaborators who choose to work on that article need to come to a mutually agreeable balance point on that one scarce Barack Obama article.
- People care about what Wikipedia's Barack Obama article says because Wikipedia has earned the credibility for being a balanced, coherent place with norms that reward alignment on ground-truthed facts.
- People care about what Wikipedia says because other people care about what Wikipedia says.
- It's a fully emergent process born out of swarms of human intention that has at its core a kind of inherent scarcity and buttressing network effects.

44The best specialists have larger blindspots.

The best specialists have larger blindspots.
- As you dig down deeper in your speciality, you can see fewer and fewer degrees of the sky.

45Smudged maps are only useful if you hold them lightly.

Smudged maps are only useful if you hold them lightly.
- If you don't realize that it's smudged you'll get lost.

46When there's lots of demand and little content, people rally around even crap content.

When there's lots of demand and little content, people rally around even crap content.
- How much prominence (in quality) is necessary for people to rally around it?
- It's a function of the prominence and also the amount of demand.
- A lot of demand allows even very small prominences to accumulate attention.

47Elephant birds are when the world works the way it should, in a delightful, unexpected way.

Elephant birds are when the world works the way it should, in a delightful, unexpected way.
- In Horton Hatches the Egg, Horton incubates a bird's egg... and when it hatches it's an elephant bird.
- Not the way the world works, but it's how the world should work.