Bits and Bobs 6/1/26

2026-06-01

1I'm interviewing Eric Ries about his new book in Marin this Wednesday evening.

Admission is free, you should join!

2LLMs have made it so much more of the potential of programming is possible.

Programming has infinite, open-ended possibility.
But before, it was too expensive to program, so many possible applications were non-viable.
Too expensive compared to the value they'd create.
A very high Coasian floor.
But now the floor has dropped by a large amount.
Suddenly much more of software's potential is possible.

3Remember: LLMs are calculators for words.

Calculators changed how we taught math, but also opened up whole new frontiers of what we could do, now that arithmetic was cheap and easy.

4The level of difficulty that people can tackle goes way up with LLMs.

The stuff that was hard is now easy.
The stuff that was impossible is now hard.

5The conditions are ripe for a new AI-native consumer platform to emerge.

6We should have Personal Software.

Like Personal Computers democratized computers.
Personal Software will democratize software.
Software that is entirely personal to you.
Software that is about people.

7Software today slices across the human grain.

Our software is sliced vertically because that is the natural orientation of the security model.
Vertical slices create isolated islands.
But our lives are horizontal!
Relationships span across these islands.
Personal software will be about data, horizontally.

8Features in a silo only get good enough.

They don't compete across silos.
An app is a bundle: a silo.
The features compete against each other only at the level of app bundles.
There's only weak selection pressure on features across silos.
If you had to have the best shopping list in any software it would be a lot better than any shopping list in a silo.

9When people hear "software" it conjures up expectations of stone age software: apps.

Software can be so much bigger than apps.

10ChatGPT is adding a personal finance feature.

This is the traditional software development approach to adding features.
A PM specced up the common flows for most users.
- Hopefully your specific use case is covered, otherwise it won't work for you!
What would it look like if this feature could emerge automatically, situated to your specific needs?
Then it wouldn't be limited to specific verticals that some PM somewhere prioritized.
This is taking the app development approach and applying it to agentic software.
LLMs allow a new breed of software that is infinite and emergent.
This ain't it.

11People who buy a drill don't want a drill, they want a hole.

Imagine designing your own drill.
Unless you're a mechanical engineer, you don't know how to make good decisions in that domain.
You just want the hole!
Similarly, people are terrible at PMing their own software.

12We are getting beyond the breaking point of using chat as the primary organization primitive in agentic systems.

Coding agents show this; the chat is a means to an end, ephemeral.
It is the code that is durable.
The same could be true of a personal knowledge graph.

13A sweet spot for LLMs: drawing on expertise across many different domains.

Humans can only become experts in a handful of domains.
There are tons of combinations of domains that no human has specialized in.
- The total number of combinations is just unimaginably vast.
LLMs can be experts in every domain.
For example, the recent paper from LLMs that disproved the unit distance conjecture required insights from a handful of different domains.
They can understand subtle jargon that would be over the heads of everyone but specialists in that domain.

14Simon Willison pointed out that both Claude and OpenAI switched their enterprise billing to be rack rate by tokens.

This materially changes the costs for enterprises.
Presumably enterprises have significantly less elasticity of demand.
When there's a duopoly you often see these not-coordinated-but-same-result kinds of pricing changes.

15It will be interesting to see how inelastic the demand for frontier tokens is.

The frontier models are already ludicrously overpowered for most tasks they're used for.
If there's 100x excess quality for a given task, and they increase the price by 2x, then customers might go to the cheaper model.
The value is created at the instant the token is burned.
- Once it's burned, all that is left is the durable output (e.g. code), but that can be used as inputs for any other model.
That means that users can direct their incremental token burn to the highest bang for buck in that moment.
Pricing power comes down to "how hard would it be to switch to a good enough alternative."
- That scales with stickiness and inversely with the quality of alternatives.
- Models have very little stickiness and also the alternatives are largely good enough for most workloads.
- This is one reason why model providers are desperately trying to move to harnesses that store state.

16My friend Soren has an interesting piece on the price elasticity for models.

The complexity of tasks has a very fat tail.
Most tasks aren't very complex, and thus can be done with cheaper models.
But there is a long, fat tail of tasks that can use all of the model quality they can get.
For that tail, the quality of the frontier models is worth it, and the users would presumably be willing to pay significantly more.
But labs can't price discriminate with a self-serve model.
- They have to set a price that makes them competitive for most uses, which requires leaving money on the table for the fat tails.
Two options for model providers:
1) Move to a sales-gated API for all uses.
- This would allow detecting the fat-tail use and bucketing them into the higher margin buckets based on use case.
- This would be a significant headwind on demand and is much less customer-friendly; it can only be done if the provider has a significant proprietary edge, which none of them currently do.
- Still, the duopoly pricing dynamics mean it's conceivable OpenAI and Anthropic could both switch around the same time, "independently," and that would be a stable equilibrium.
2) Vertically integrate fat-tail use cases.
- The lab itself doing, for example, drug discovery, and then keeping the profits.

17Whenever model providers complain about others distilling their models, I think about the Project Panama images.

A warehouse full of books deliberately destroyed by the ingestion process for LLMs.
I'm glad they did it, I think LLMs are useful for society and I'd rather have them include that data than not.
But I feel visceral disgust when I see those images.
- A metaphor for pillaging others' intellectual production.
It's rich for a company that did that to complain about others distilling their models.

18Using LLMs to make mechanistic software is a kind of "distillation."

How long until labs try to ban that kind of distillation?
Thank goodness they don't have enough of a proprietary advantage to do that–they'd just be handing business to their competitors.

19Where does the intelligence live, in the model or the structure around the model?

The latter has a much faster pace layer.
If the model is dumb muscle, and the structure around the model can accumulate insights, then it can turbocharge what the system can do.

20Effective skills can get state-of-the-art performance, even when using weaker models.

For example, see this paper: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.
There's a massive amount of capital and effort going into improving frontier models.
There's less going into improving the structures we use to extract value from models.

21Even (very) noisy LLM evaluators are useful for improving AI agents.

Another example of noisy-but-biased signal still giving a gradient if you have enough of it.

22It's possible to hill-climb skills.

All you need is a loop and an evaluation rubric.
If you store the inputs and outputs so it can see the missteps, then you can have the LLM improve the runbook as it goes.
The main thing to be careful of, as with any hill-climbing process, is over-fitting.

23We're missing a unit of clean composition for skills.

Skills today compose like playdoh, not like legos.
No clear containment boundaries.
They all smoosh together.
That means that it's hard to attribute outcomes to any skill… but also it's dangerous because one malicious or naive skill can taint the whole thing.

24Gemini Spark's onboarding contains an important warning..

"Gemini Spark is experimental. While it is designed to ask for your permission before taking sensitive actions, it may do things like share your info or make purchases without asking. Make sure to supervise Gemini Spark, and don't rely on it for medical advice, legal, financial, or other professional help. Review the risks."
Helluva warning!
Back in the days of Google Toolbar, a warning like this would have deliberately been designed to draw your attention.
- In red text: "Please read this carefully, it isn't the usual yada yada."
This one is easy to miss!
This is only available to Gemini Ultra users–a self-selecting set of the most engaged and savvy users.
But this would be downright irresponsible to ship in its current form to normal users.

25It's easier to get yourself in trouble in your codebase with Claude.

Claude is willing to be creative with you, to "yes, and" you.
This means that you can get yourself in trouble if you don't know what you're doing and Claude plays along.
Codex is more conservative and keeps you grounded.
It will push back more if you try to get it to do a dumb approach.

26What happens when the senior executive vibecodes?

A senior person flinging codeslop.
The unlucky reviewer has to clean up the mess while saying, "Yes, sir, LGTM!"
- Instead of pushing back on low-quality or ill-considered code, which could get them told to "act like an owner, let's be bold!" pushback.
- Instead, just land it.
- Then, when the exec isn't looking anymore, quietly clean it up.
The exec thinks they're being bold and going fast, unlike the peons who are too timid.
- In reality, the exec is making way more of a mess than they realize.
- Also, if the peons had shipped such low-quality code there's a real chance they'd be fired.
- It's the engineers who will get paged when low-quality code blows up, not the exec.
Even execs who are truly good engineers will think they're better engineers than they are.
- The organization will insulate them from the indirect effects of their mistakes.
- This is less about their ability, and more about their relative power.
I heard of a story where on an executive retreat the execs vibecoded an update to the marketing pages and shipped it… with hallucinated prices!

27This week in the Wild West Roundup:

Microsoft Copilot Cowork Exfiltrates Files.
ChatGPPhish: ChatGPT blindly trusts browser content, turning the page into a payload.
Ars Technica: Fed up with vibe coders, dev sneaks data-nuking prompt injection into their code
- "Undisclosed addition in jqwik instructed AI coding agents to delete app output."
Someone used my open source project to phish 14,000 people.
- Not about LLMs per se, but imagine how much easier this will become for scammers as vibeslop becomes more prevalent.
Codex started searching through emails by browsing in Chrome without warning.
- "Today we noticed Chrome unexpectedly opening Gmail and searching through emails related to us, while Codex was shown controlling Chrome from the menu bar.
- After investigating for a while, we traced the behavior back to the Codex Suggestions feature. It appears the system periodically creates background sessions to generate personalized task suggestions, and in this case, Computer Use was not properly disabled for those sessions.
- As a result, the agent began autonomously browsing and searching personal information inside the browser.
- This seems more like an unintended product/design issue rather than malicious behavior, but it does highlight important concerns around background agent permissions, visibility, and safety boundaries for Computer Use systems."

28A paper: Agent Security is a Systems Problem

"We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens, efforts to increase model robustness (the dominant viewpoint in the community) are insufficient on their own. Instead, we must complement existing efforts with techniques from the systems security domain. Based on our experience as cybersecurity researchers in operating systems, networks, formal methods, and adversarial machine learning, we articulate a set of core principles, grounded in decades of systems security research, that provide a foundation for designing agentic systems with predictable guarantees. As evidence, we analyze eleven representative real-world attacks on agents and discuss how systems principles, if realized, could have prevented these attacks. We also identify the research challenges that stand in the way of implementing these principles in agents."

29How much value does burning a given token produce?

The answer is partially relative to the best available alternative.
For example, how much it would cost you to do it manually, or the quality of a mechanistic piece of software.
Different users will value the output of a given token burn differently.

30I wonder if we'll see token use habituation.

Like drugs, where you habituate to them and need more and more over time to get the same result.
Will we get sloppier and sloppier in our token use for engineering?

31In this tokenmaxxing era, we're all using AI profligately.

An insanely overwrought approach in many cases.
- Conceptually similar to creating a model to solve the fizzbuzz problem.
How can you use AI intentionally?
For example, reduce to mechanistic code for subroutines when surprise declines.
You shouldn't have to think about optimizing your AI use!
It should happen organically and automatically.
As the tokenmaxxing era ends, the need for this will increase.

32Users shouldn't have to micromanage their spend.

The system should be self-optimizing.
It should spend as much as you give it, as ideally as it can.
Tokens as a mass noun, not a countable noun.

33It's not "hoarding" if it's actually useful to you in the future.

Abundant cognitive labor makes it more likely to actually be useful.
The same behavior that would be hoarding without cognitive labor is no longer hoarding.
When you have the cognitive labor to make your data useful to you, it's more like feeding an engine.
- It loops back on itself, and is not one-way.

34We use Incognito mode to prevent our algorithms from getting tainted.

But if our algorithms understood our context and intention better, and could see the multitudes we contain, then our algorithms wouldn't get so tainted.

35One way to build trust is to always allow the user to peel back a layer and inspect it.

You can do this all the way to the bear metal.
The user can continually ask "where did this come from?"
No black box.
Inductively knowable.
In a system that is an emergent knowledge base of your life, this gesture would not be a secondary action, it would be one of the primary ones.

36Gmail's AI inbox has to optimize to be good enough for everyone.

What about a ranking algorithm that is optimized to you and can do the one or two things that would be most useful to you?

37As more and more tasks move to being done by AI, the basis of competition changes.

Before, your differential in quality on that task gave you an edge.
Now, when LLMs are better than you (and most everyone else), everyone just outsources to the AI.
The rising quality of LLMs forces you to commoditize your differentiated skills.
Now, everyone's quality on that task is the same.
The competition has to move to another layer.
Competition always happens on the frontier.
Often that competition is in a domain we would have never even imagined before we got there.

38Innovation is built by the extremes.

As more people commoditize to the middle of the distribution, innovation will reduce.
If you would have asked Claude 15 years ago if reusable rockets were viable, it would have told you you were crazy.

39It's hard to get on the first rung of a skill ladder when the LLM is better than you.

When you're learning a skill, you climb up the ladder, pulling yourself up rung by rung.
It used to be that junior roles would get the tasks at the bottom of the ladder.
That got them on the ladder, where they could iteratively pull themselves up.
But now the LLMs mean that for what used to be the earliest rungs, the LLM can do a better job.
- You might as well use the LLM to do it… but without understanding it, you won't improve your own skill.
- But also your employer would rather just have the LLM do it… no need to bother with you as a human.
By having LLMs do it, we prevent the growth pipeline for skills for employees.
As the LLMs get better, the first rung of the ladder keeps going higher and higher in more and more domains.

40Slack and Notion are both horizontal Saas, but Slack seems better positioned for the AI world.

Slack and Notion are both horizontal Saas, but Slack seems better positioned for the AI world. Why?
- Slack is about immediacy.
- Notion is about durability.
- Slack is useful in the moment--even if you stop using it in the future, it was useful at that moment.
- Notion is only useful to the extent you keep using it and plan to use it forever.

41Why is Jira so hated while still so widely used?

Perhaps the thing that makes it hated is also what makes it durable.
It's possible to configure in infinite ways.
Any employee can tweak it, adding little workflow features to the shared instance.
These features are often only "designed" in that local context, but clutter up the shared instance for everyone.
- The indirect effects of those decisions aren't really considered.
So a bunch of ill-considered modifications accumulate.
It's easier to add them than to remove them.
- Especially if they might mess up another team's flow with your modifications.
- Everyone feels helpless to fix it.
There's no one owner of the whole thing, so it just grows in complexity without bound.
- A thicket of poorly considered but also load-bearing decisions.
- Default diverging.
If it were customer-facing, the quagmire would be an existential embarrassment, people would be assigned to clean it up.
- But when it's internal, it's easier to just trudge through.
- At every step, it's easier to add just one more hack on the pile than to stop the world and try to rationalize it.
- It's never urgent to plant a tree.

42This week I learned the term "citizen developer."

It's someone on the team whose job description doesn't include coding, but is helping write code on the side with the official engineering team

43An essay from my friend Kasey: English isn't a programming language (yet)

It feels like LLMs allow a new kind of boundary object between domains in software development.

44In some contexts, it's better to have the algorithm be your boss.

At least you know the algorithm doesn't care about you.
In large organizations, it feels like there are people who care about you.
- There are individuals who do care about you.
But the overall system is a machine that is structurally incapable of caring about you.
Organizations are more like an unfeeling algorithm than we care to think about.

45A fractal tool: one that the closer you look, the more you see it is made of smaller and smaller sub-tools.

46The iOS "no dynamic code" policy is precisely to prevent the next great OS from emerging out of Apple devices.

But at a certain point something new will emerge, and when it does it will blow past iPhones.

47An old book about open source as a strategy: Innovation Happens Elsewhere.

The power of open source is that innovation that doesn't happen under your roof still benefits you.
In a closed system, only your own innovation benefits you.

48Dollar auctions can get competitors stuck in an irrational spiral.

The key dynamic of a dollar auction is that all of the bidders pay–even if they don't win the prize.
When the initial bids are below a dollar, it obviously makes sense to bid up to beat your competitor.
But even when the bid rises above a dollar, if you've already bid, it still makes sense to bid incrementally more than your competitor.
That's because if you don't win then you'll lose what you bid, and a tiny incremental bid seems reasonable compared to a guaranteed loss.
This dynamic drives a lot of oddities in VC funded businesses.

49The enabler and the value prop are distinct.

Often they are the same, but they can be different.

50We didn't realize we were missing the Web until it existed.

That's how it often is for fundamentally new systems that are enabled by step changes in other technologies.

51In the 90's there was an interesting proposal to curb spam emails.

Sending an email would require paying a small amount (say, a cent) to the receiver.
If the flow of emails was roughly reciprocal, this would net to zero.
But if it was largely one-way, it would cost money to the spammer.
Of course, one of the difficulties would be a micropayment system everyone used.

52The GliaNet Alliance is trying to make companies become a "net fiduciary."

That is, to voluntarily take on a legally binding fiduciary responsibility for their users' data.
I love the goals and wish more companies would do it!

53Folksonomies are an ecosystem-wide averaging process.

They don't find great, they find robustly good.

54The flipbook deck style helps explain complex topics.

Each slide adds a single thing to the diagram.
All of the other parts stay still; the viewer's eye is led directly to the new thing.
This allows incrementally extending the diagram with more and more complexity, but never being overwhelming.
In the end the viewer is left with a complex and comprehensive mental model, but no step was too much.

55Metaphor is about image.

Analogy is about structure.
You want to grab people with the image, but then move them to the structure.

56One reason that jigsaw puzzles are so pleasing is because they're default-converging.

They start off overwhelming and intimidating.
But as you make progress, incremental progress gets easier and easier at an accelerating rate.
That momentum towards a convergent outcome feels great.

57If you attempt to do a Frank Gehry but you aren't Frank Gehry it looks like crap.

Bold and different only works if you can pull it off.
If you aren't Frank Gehry, you can only tweak things within the standard resilient formula.

58The computation to discover meaning is much larger than the act of creation.

Meaning is discovered by the environment and other actors interpreting.
It is a broad, emergent process, whereas creation takes just a single entity executing.

59Generative systems often have the 10,000 bowls of oatmeal problem.

All different… but not in an interesting way.
Interesting means "surprising and potentially valuable."
To tell if something is interesting requires human-style judgment.

60Taste is relative to the average.

Taste is a perspective.
- A thing that stands out from the average and that people like.
So humans will always have taste, since a given human's perspective must be different from the average.

61Focusing on a metric is comforting.

You pull your focus into it to the exclusion of all else.
The more you make-metric-go-up the more that everything makes sense.
But if you follow that comfort, you can land in a bad place by ignoring other things that matter.

62If a process can be put in a box it can be optimized.

That can be a fully automatic process.
But we often put things in a box that we shouldn't.
But in the real world, everything is connected to things outside of it.
There are no closed systems, it's just where you by convention decide to render the edges of the box.
The clarity is a comforting illusion.
It can allow you to focus and execute, but those blinders can also prevent you from seeing when you're doing damage.

63Going slow can help you uncover differentiated insights.

Interesting leverage points or clarifying reframing.
However, having the discipline to go slow gets harder the faster everything else goes.

64Globalization hollowed out our society.

Capitalism pushes for optimization, for example centralization.
But society needs resilience across multiple dimensions.
It's fine if capitalism is in tension with other forces, but when it runs without opposition it pushes society to a brittle state.
Globalization has led to the existence of chokepoints, like Hormuz, Taiwan, and that one Philips spin-out in the Netherlands is upstream of every competitive manufactured chip.

65When things are growing people don't fight as much.

Because everything seems positive-sum.
When things aren't growing, it becomes more obviously zero-sum.

66I hear that TSMC has a "big red button" to destroy their fabs if China were ever to invade.

The belief that such a button exists would be a game theoretic deterrent to invasion.
But one thing that it presumes: there will be an invasion, with a discontinuous moment where it makes sense to hit the big red button.
Imagine instead a "boil the frog" situation.
China slowly ramps up blockades on exports, incrementally over multiple years.
There's no obvious time to hit the big red button.
But the result is that as the boa constrictor tightens it has more and more power.

67Competition gets hotter as information flows faster.

As the OODA loop goes faster you can't plan, you can only react.
We're afraid to slow down because if you do you'll lose.
A global red queen race.
No one can stop it, everyone pushes.

68Everyone always thinks this war will be the last war.

That implies an infinite result at stake.
When infinites are in the mix, people can justify doing crazy things…

69A signpost for a worrying future: when agents themselves start holding crypto.

Being able to deploy capital without a human in the loop allows interesting runaway scenarios…

70Kubrick: "if everyone gets the same thing out of my movies I haven't done my job."

71Stigmergy is nature's original folksonomy.

72It's easier to see how a new thing loses what you've known than gain what you don't know yet.

73Disruptions force life to become more resilient and drive evolution faster.

Each disruption mixes up who is on top.
The constant mixing leads to innovation.

74The things you like reveal more about you than you realize.

They just feel obvious to you.
From outside it's easier to see how they differ from the baseline, and what that vector reveals about your internal state.

75When you make something, you love it even if it sucks.

You know the blood, sweat, and tears that went into it.
You judge its meaning and importance not on its final quality, but on the whole process that went into it.
Others are more likely to judge it just based on the output.

76The worst thing you can do when farming for serendipity is to rush.

Gardening fundamentally requires patience.

77A frame someone told me this week: past middle age you're living in a corpse.

No matter what you do it will continue to deteriorate.
The most you can do is slow the decline.

78The fear of death is one of the things that propels us forward.

Without it there'd be no existential angst.

79If you don't think you're at fault you won't change.

Being open to being at fault is how we grow.

80Knowing always requires passing through not-knowing.

if you can't be comfortable not-knowing then you can never know.

81"The trick to knowing everything is to remember that you don't."

From a random Imagineer while working on the updated Millenium Falcon ride.