Bits and Bobs 11/11/24
1LLM-assisted sense-making will unlock new ways to communicate.
An essay is a static slice through the hyperobject of a speaker's perspective.
Essays are the only way to communicate complicated ideas in an era of print.
But LLMs allow capturing a given hyperobject idea in an interactive medium that receivers can interact with.
A kind of automatically-created choose-your-own adventure for the receiver through the idea.
Each new communication medium allows society to think new thoughts that weren't feasible before.
What kinds of new thoughts will society be able to think?
2It's amazing how useful large context windows are in LLMs.
It's barely been a year since we had to deal with miniscule context windows of 4k tokens or so.
It was like living in the stone age, can you even imagine?
We had to do techniques like RAG (Retrieval Augmented Generation) to allow LLMs to work with large bases of knowledge.
We'd use RAG to do fuzzy semantic based matches for things that seemed related to the user's question and copy/paste as many into the context window as would fit.
This worked OK for factual questions, but it couldn't answer questions like "What are the major themes in this work", because the theme isn't a collection of small, semantically obvious facts; it's a high level vibe.
LLMs today with massive context windows are able to do quite good jobs at identifying themes when given the whole work.
And yet already it's easy to take large context windows for granted.
What progress we've made!
3We're lucky to live in a universe with high-quality LLM APIs.
Here's an alternate universe that is totally possible to imagine.
OpenAI releases ChatGPT before they release any API.
They don't ever release an API because it would be "dangerous" …and also undermine their app's differentiation and power.
Later Anthropic comes along and does the same.
Neither feels compelled to release an API because both want to be an aggregator.
In that world, we'd have LLM-powered aggregator chatbots, but no way to use LLMs in other applications.
A recent report said that something like 75% of OpenAI's revenue comes from ChatGPT.
All it would have taken in this alternate universe is that OpenAI discovered the value of ChatGPT as an app before they released an API.
Instead we got the world where all of the major model creators have an API and can't remove it lest they cede it to competitors.
A wildly different world, and a far more encouraging one!
4You're a king when you deal with Claude.
When Claude says "What an astute observation" it feels dangerously good.
It effectively says "Good idea, majesty!" .
But the longer that goes on the more you decohere from ground truth.
Fawning is a reinforcing loop.
It makes you lean more into the thing you were already doing.
You need a feedback loop to balance it out.
English kings realized they needed oppositional agents.
Agents competing amongst themselves to represent the king today, keeping the king ground truthed.
If you have not one massive agent but a "parliament of agents", then you can have different personalities, a structure of competing incentives in the swarm that benefits you.
A simple parliament is your shoulder angel and shoulder devil.
But why not have more than two?
5Optimizing for efficiency gives you superficial quality while eroding the fundamentals.
The more you do it, the more you get a gilded turd.
What's inside counts much more than what it looks like.
6Slop is "high fructose ass imagery"
At first glance it hits you with a WOW!
But the closer you look the less impressive it gets.
The mark of true quality is the closer you look, the more impressive it gets.
7LLMs are excellent teachers.
They can patiently engage with our questions, helping you learn the material.
But imagine trying to learn German and using an LLM.
You have to keep reminding it to not tell you the answer but to tell you why your answer was wrong so you can learn.
A voice whispers in your ear: "Why not just ask it the answer and have it translate it for you?"
9It's hard to find software that is Just Right.
Imagine a given user's ideal set of use cases.
If a bit of software is missing features the user wants, it's too small.
If a bit of software has too many features the user doesn't want, it's too large.
Each extra feature is additional conceptual overhead making the software harder for that user to use.
The likelihood a Just Right piece of software exists is better the smaller the set of use cases.
Apps are pretty chunky; they are high friction to distribute which means they tend to bundle a lot of features to make them worth it.
This means apps are less likely to be Just Right.
But if the distribution physics were different and you could have smaller bits of software, it would be more likely there was a piece of software that was Just Right.
10Screenshot to code.
Imagine being able to take a screenshot of one app and then tell your system to make that UI applied to a different set of your data.
A party trick LLMs are quite good at.
11Apps are vertical.
Apps are vertical. Users are horizontal.
Apps are silos; they only know their vertical slice of a user's world.
A user is horizontal, stretching across many apps.
What if you could make software that was horizontal, just like users are?
12Data has combinatorial power.
When you mash up two disparate sources of data you get more potential than in either data source alone.
This is also true for our own personal data.
One reason we don't regularly experience this combinatorial power is because in a vertical, app-based world it's not possible.
And the small number of aggregator apps that do have enough of our data are unable to build niche software due to the tyranny of the marginal user.
13If you peek into how multi-modal models work, they're a ball of cheap, random hacks that turn out to be terrifically, unreasonably effective.
Search engines aren't that different on the inside.
At first it seems random and unprincipled.
In a way, it is: we can't explain why this random hack works with anything like a grand theory.
But there is a logic and order to it; for every random hack that works, there are dozens that were tried and turned out to not work for some reason.
The hacks that work stick; the other ones are forgotten about.
What results is a seemingly arbitrary collection of hacks that just so happen to work.
There is a selection pressure you can't see, possible because of hill climbing: a clear objective metric to experiment with.
But they're hacks on top of hacks; almost certainly hitting a local maxima.
14Some people critique LLMs as being just like BitCoin: a massive energy hog.
However, there's a key difference.
In BitCoin, the energy use is the point.
The price of computation is load bearing.
It's what makes the ecosystem hard to fork, in proportion with how much energy is invested.
As computers get more efficient at doing the mining, that gives the entity with the more efficient hardware an edge to do more mining.
With LLMs however, everyone wants the energy use per unit quality to go down.
Users benefit as it goes down.
And providers want it to go down to get an edge over competitors, which then sets a new baseline expectation for users, which then leads to a further round to get more of an edge..
Everyone wins!
A critique of this analysis: as LLM inference cost goes down, demand will rise for the now cheaper inference.
15As you mature sometimes you find that the creative activities you did no longer feel meaningful.
Normally people go through this when they retire.
But now maybe all of humanity is contemplating their mass "retirement" in lieu of AI.
16The net movement in the system is the less powerful things move to the more powerful thing.
When code is expensive, data is (relatively) cheap.
Data flows to code.
When code is cheap, data is (relatively) expensive.
Code flows to data.
A copernican shift!
17Abstraction is unreasonably effective in the realm of Computer Science.
Abstraction gives compounding leverage.
Each additional layer gives a multiplicative factor of leverage.
Abstraction is what allows a unit of code to have more than 1:1 value creation.
It can do this in the realm of Computer Science because computer science is perfectly precise.
Computers do precisely what you tell them (modulo cosmic rays).
This means you can reason perfectly about side effects in the system–even if it's sometimes beyond the grasp of a human mind.
18To build confidently on a foundation it must be sound.
If the foundation is squishy or unsound, then it will feel like quicksand.
If you add magic to an unsound system it makes it more unsound.
19The test of whether code is sound: can multiple people successfully own the system?
That is, maintain, fix, and extend it with an accurate mental model?
If not, it's not known to be sound.
Soundness normally requires careful layering, making sure that each layer is thin and understandable.
Thinner layers are also easier to communicate and explain to people; instead of having to communicate a multi-ply idea, you can communicate a succession of single-ply ideas.
20When can you confidently rely on magic in an engineering system?
If you could have written the code that does the magic yourself.
Then the magic is just helping you have leverage.
If not, then the magic could have a very different behavior than your mental model of it.
Of course, in practice we can't peel back every layer; that's the whole point of abstraction.
But the bar is could you have understood it and written it, with sufficient time?
21The hardest part of engineering is collaborating with others.
You have to make it boring, simple to understand to another person.
Instead, you have to architect it so that another person can also own the code.
If only you had to understand it, it would be much easier.
Although making code make sense to yourself in the far future is kind of like collaborating with another person, because you can't rely on the current working memory or knowhow.
Making your own code make sense to you in the future helps you make it more sound and possible for others to understand, too.
22Two archetypes of engineers: architects and codeslingers.
Architects get joy from making things sound and tidy.
They get joy from taking a messy, hard to understand thing and making it tidy and easy to understand, even boring.
"Look at this clever abstraction I came up with that makes this powerful outcome easy to reason about."
The architect says "What is the most boring way I can accomplish this interesting thing."
The architect uses existing boring frameworks and patterns as much as they can.
The codeslinger gets joy out of inventing something that no one has thought of before.
The codeslinger is a cowboy.
They don't like to do things the boring way.
You need both codeslingers and architects in your codebase.
Codeslingers extend the possibility; architects metabolize the system into something understandable, creating a stable foundation to reach even further from.
If you had only codeslingers, the codebase would quickly descend into a combinatorial quagmire, impossible to maintain let alone extend.
If you had only architects, the system would never add new functionality.
Codeslingers make excellent prototypers.
But beware relying on their code as a foundation to build on.
Their code will be weird, rely on magic only they understand, and be riddled with bugs.
The more experience a codeslinger has as an architect in other contexts–in making boring, production-grade, resilient software–the more likely the code they sling can be used as a foundation.
Never let a codeslinger write a developer-facing abstraction, they'll come up with a weird thing most developers viscerally hate.
23Games have a higher bar to meet than utility software.
You have to keep using Microsoft Word even if you don't like it, because it is a means to an end.
The end might be for example turning in a report the boss asked for.
You will slog through it even if it's hard, to achieve the end you're seeking to achieve.
If you don't like a game you simply stop using it.
The game itself being fun is the end.
If it's not fun, there's no point.
24Games are challenging to develop: you have a large number of teams collaborating on a single artifact.
You need to find the spark of the fun, and then make it increasingly higher fidelity and real, into a roaring bonfire.
The "fun" is found at the highest pace layer.
If you had strict layering, you'd spend all your time on the lower layers before discovering if the higher layers were any fun.
If you had tight coupling, a mistake from any one sub-system could break the whole thing.
Instead, game engines often use an entity component system.
Everything is a generic entity, with the ability to layer specific functionality on top.
Everything reading from and writing to, and reacting to, the same system.
A hub and spoke model is resilient: one spoke can go down and not bring down other spokes. And other people can add spokes without interfering with the thing!
Resiliently extensible, loosely coupled.
25I implicitly trust my browser (and any extensions running in it) to not mess with my data or exfiltrate it.
Assuming I installed my browser and its extensions knowingly.
But I don't necessarily trust your browser or your extensions with my data.
26The assumption that Chatbots are the killer app for LLMs presupposes a centralized, necessarily one-size-fits-none system.
When you centralize, you have to have a one-size-fits-all policy or approach, and getting it right becomes even more important, because it affects so much with so much leverage.
But getting it right becomes increasingly impossible since you have to cover so many conflicting requirements at once.
27Centralization creates brittleness.
Centralization creates efficiency.
Efficiency is also brittleness: now only a single actor has to be corrupted or make a mistake to bring the system down.
Centralized systems are significantly more easy to corrupt.
28Centralization gets scaling benefits...
Centralization gets scaling benefits... but also centralizes power and increases downside risk.
A person relying on a centralized resource can be cut off from it by the controller of the resource and have no fallback.
29The cloud is a centralizing force.
In the era of the PC, once you bought your PC, you alone decided what software to run on it.
Now, there's another entity–one of a small, centralized number, perhaps in a different country–that could cut you off in a moment if they wanted to.
The cloud allows significantly better efficiency (returns to scale, better utilization) but like all centralization creates brittleness.
Centralization emerges in most systems; because data is so quick and cheap to move, it emerges in the world of data faster than in the world of atoms.
30If you search for correlations long enough, you will find some.
Imagine looking at 20 different variables.
That's 20 * 19 different possible combinations.
If the significance threshold is 0.05, that means you'll almost certainly find correlations that are "significant" but also spurious.
A classic XKCD also made this point.
31It's really easy to Goodhart yourself in the face.
"Look, this system I've set up is telling me precisely the thing I want to hear! What a crazy random happenstance!"
All of your instruments tell you you're flying level, but you're in a nosedive.
32The opposite of serendipity is zemblanity.
A word coined by William Boyd: "the faculty of making unhappy, unlucky and expected discoveries by design.".
In fragile systems, bad luck compounds.
An unlucky break puts other adjacent parts out of the bowl and into the pedestal, which can cause a chain reaction.
A strategy is to fragilize your enemy so the universe helps destroy them through strings of bad luck.
33Thinking about the system is multi ply.
Thinking about the system is multi ply. Thinking about a specific example is one ply.
Even if you think deeply about all of the examples, if you consider them in isolation you will miss the systems insights at the level of the whole.
34Multi-ply insights take a combinatorial amount of time to explain.
Each ply compounds on the previous one; to fully explain it requires serializing it in a process that has a combinatorial explosion.
For a complex insight, that excruciating length can make it so no one else has the patience to sit and receive it.
So instead you communicate factored out principles… but the risk is that the receiver can't unpack them and they are just inscrutable.
36You can see the future, but only if you're content to speak to others in riddles.
Imagine being able to see 10 ply beyond everyone else.
You can see things that are crystal clear to you now that will only be clear to others much later.
"Oh, I now understand the thing he was trying to tell me 5 months ago. If I would have understood that then, it would have saved a ton of time!"
For it to connect with others, it has to be laid out in excruciating detail, with the complex insight laid out in its combinatorial expansion, blooming in terms of the number of words necessary to communicate it.
The only way to communicate it before the heat death of the universe is to use shortcuts: "jargon".
The jargon will make sense to you but won't make sense to people who aren't already familiar with it.
You'll see the multi-ply blossom of insight packed into that seed of an idea; the receiver will only see the inscrutable seed.
It will sound like unintelligible riddles until they are ready to get it.
Being able to see 10 steps ahead is a curse, to constantly feel unable to communicate with everyone else, watching them fall off of cliffs you were trying to warn them of.
Would you rather be able to see the future but have the agony of being unable to communicate it to anyone else, or not be able to see the future at all?
37Isolation allows you to move, not to be held down by the Lilliputian web of nearby constraints.
But if you choose your isolation boundary wrong, you will make faux progress that will back you into a corner.
The constraints matter!
Which ones to keep and which ones to ignore is a judgment call with very high stakes.
38Why are rich people and attractive people more likely to be jerks?
Let's try out an inductive model for why this could emerge, even if we imagine that those characteristics don't cause people to be jerks.
If someone is a jerk to you, do you call them out on it, or do you let it slide?
If you call them out on it, two things might happen:
Perhaps they realize that they were a jerk, and after some embarrassment perhaps they grow or change.
Or, they think that you are wrong or aggressive and don't want to talk to you in the future–or might even push back metaphorically or physically.
Whether to call someone out comes down to two factors:
Do you think you'll interact with this person again directly or indirectly?
If not, it's not worth the downside risk for no upside.
Do you think the other person will have power relevant to you in the future?
If so, then the downside risk of them not wanting to work with you is significant.
In some cases power is contextual–a heavily cited professor won't get special treatment at the car wash.
But in other cases, power is consistent across many contexts.
Two forms that have that characteristic: rich people, and attractive people.
So rich people and attractive people (among others) are less likely to get called out for being a jerk, and thus more likely to go on being a jerk, possibly unbeknownst to themselves.
39A common trope in making-the-band documentaries: the internal creative strife before the game-changing break-through.
But this does not mean that internal strife is predictive of a big breakthrough.
Just that it's necessary for a game-changing breakthrough.
The vast majority of failed bands also have creative strife, but then they never have the breakthrough, and they disband, and you never hear about them.
It's hard to have a genius thing without some kind of creative turmoil (the seeds of greatness come from having a perspective, and perspectives can clash).
But the vast majority of turmoil does not create a genius outcome.
40If you ever have to care about whether a certain foundational thing is true, then you always have to care if it's true.
That's why when you cross the 99.99% threshold is so powerful, you no longer have to care.
At that point it becomes a force of gravity; even though it's powerful, it's omnipresent and unchanging, and you don't have to devote any head space to it.
41This week I learned about the Karpman Drama Triangle.
Like any lens, it's not fundamentally true or false, just another lens for your toolkit.
It identifies three roles:
The Victim
The Persecutor
The Rescuer
It notes that there is a stable triangle where three people can fall into these three roles and become mutually codependent on one another.
Each gets what they need in some sense, locking them into the triangle.
People tend to have a role they gravitate toward and are more likely to fall into.
This stable triad spontaneously emerges and persists in a surprising number of contexts.
What is the role you often fall into?
What is the way that your actions in that role perpetuate dysfunction?
43Disruptive technologies create a little period of chaos.
Chaos is a good time to discover and create new power structures.
Power structures tend to accumulate and centralize over time.
It is the periods of chaos that allow new more decentralized power structures to emerge… until they, too, centralize, and then the cycle repeats.
44In chaotic environments, the players that can adapt are more likely to win.
Being principled makes it, unit for unit, harder to adapt.
There are alignments that are forbidden by your principles.
This means that in a chaotic environment some of the worst players–the least principled–are more likely to succeed.
45Other people can see your power more clearly than you can.
"Why don't they trust me? My intentions are pure!"
"You're a giant, and even when you tiptoe you smash whole villages. Can't you see why the villagers are terrified of you?"
Power structures are extremely important, and nearly always invisible.
47Echo chambers form more quickly the more powerful or willing to isolate you are.
"Everyone on the outside is dumb, all of us in here are smart."
48Jennifer Garvey Berger in Unlocking Leadership Mindtraps:
"When we are uncertain, we search around for understanding and we learn; when we know we're right, we are closed to new possibilities. When leaders believe they are right in a complex world, they become dangerous, because they ignore data that might show them they are wrong; they don't listen well to those around them; and they get trapped in a world they have created rather than the one that exists."
49Top-down control over coordinating entities allows resolving a coordination problem by force.
"The answer is this option".
If you're right, that's great.
If you're not, watch out!
50"Knowledge defies entropy"
A marketing slogan of the Santa Fe Institute press.
51Game changing insights come from misfits.
From the edge of the network, not the center.
The place where two networks touch is a fertile area of innovation.
Where networks touch requires a "contact language" for the two networks to understand one another.
One way is via a shared contact language, e.g. a shared research methodology.
Another way is to have a particular individual who knows the language of both networks but is in the center of neither: a misfit.
52In an infinitely deep meta game you can easily lose yourself.
At each step the most pressing, and easiest, move is to go one ply deeper.
But as you do you'll be gradually forgetting why you're doing anything, until you are lost.
All means, no end.
A zombie.
All of your humanity sucked out by the emergent metagame; the machine.
53Power attracts and also corrupts.
Power attracts and also corrupts. Fundamentally.
People are drawn to power and when they are in the influence of power they are corrupted to maintain it or stay in the game.
An infinite metagame where you must lose yourself to continue to hold power.
A dangerous game, as you get more leverage to make things happen, you lose your compass and humanity.
54No matter how disempowered you are, you can always destroy your thing.
When people feel their authority and power is threatened, that means sometimes they'll take self-undermining actions.
"You think I'm dumb? I'll show you, I'll burn our house down!"
The more that people feel boxed in an unable to exercise agency, the more they will reach towards smashing the whole thing just to feel agency.
Including inventing reasons to justify smashing it if none exist.
This, needless to say, is not a great strategy!
55Founder mode applied to the state is just plain old authoritarianism.
As an employee or customer, you can leave a company you don't believe in more easily than as a citizen you can leave a country you don't believe in.
This is one of the factors that makes the level of internal-ground-truthing at the level of the state more fragile than for companies.
56Bureaucracies are slow and hard to change and hard to innovate in.
But you can also take them for granted!
You don't have to think about them, you know they are there, grinding away, stably.
Incapable of great things, but also less likely to do terrible things.
That allows you to think about other things like innovating in other contexts.
Because we can rely on them, we take them for granted and don't realize how important they are.
57Bureaucracy is low beta: stability.
Authoritarianism is high beta... but the inherent lack of self-ground-truthing makes them tend towards an auto-corrupting end state.
Everything is at the whim of the authoritarian, and those whims can change on a dime.
Everyone--all of society--needs to pay careful attention to the whims of the authoritarian to avoid crossing him.
Huge segments of the society's mental energy caught up in that.
Bureaucracies are boring but also stable, easier to predict what they'll do.
So you can free up your mental energy to take them for granted.
58The emergent rules in an authoritarian context are different than in a boring bureaucratic context.
A tweaked force of gravity.
All of your intuition about the rules from a bureaucracy is subtly wrong.
Put your head down, don't get noticed.
In the most extreme scenarios, the only principle is self-preservation.
Being very rich only makes you safe in a system with rules.
In a system ruled by one man, that makes you more vulnerable.
You stick out, prominently, and the authoritarian can take that away from you in a moment.
If you're an oligarch, make no mistake about who's in charge.
If you can curry favor, all is permitted.
But if you overstep you'll be cast aside… or out a window.
59A classic authoritarian tactic: make being in the ingroup obviously better.
A positive boundary gradient.
People on the edge would rather be on the ingroup than in the outgroup.
Require people to corrupt themselves just a bit to join the ingroup.
That helps activate some cognitive dissonance, tying them tighter to the group.
Publicly pardon (or decline to prosecute) misdeeds by members of the ingroup.
This allows the members of the ingroup to act with increasing impunity.
Non-members of the ingroup will keep their heads down, lest they do something to attract negative attention from the ingroup.
As everyone keeps their heads down and minds their own business, trust degrades across society.
Blame any bad luck or negative consequences on the outgroup.
The authoritarian can change the definition of the ingroup at will.
This keeps even members of the ingroup constantly on their toes and focused on the authoritarian's whims.
The people who joined the ingroup are both powerful and powerless.
Entirely submitted to the authoritarians will.
60It's terrifying when you realize no one is in charge.
Help will not come.
But then you realize it's up to you to make the world a better place.
To do things you're proud of.
At the level of society it's messy, but at least it gives people more practice acting with agency.
61Follow your highest and best use, fractally.
In each situation ask yourself what your highest and best use is and do that.
The answer is contextual.
For example, maybe there's a thing that you aren't good at but no one else in the group can do, and if it's not done, the project will fail.
Seek out the highest and best use up the stack.
At each layer, the highest and best use will give you compounding leverage.
By optimizing at the layer above, you'll be less likely to get stuck in a local maxima in the layer below