Swarms of agents will have even stronger Goodhart's law than swarms of humans.

2026-02-16 · Bits and Bobs 2/16/26

Swarms of agents will have even stronger Goodhart's law than swarms of humans.
- Agents are told to optimize one thing, and by god they will do it.
- Humans will think, "well I have a reputation to uphold" or "this game is an iterated one, so I should balance short-term and long-term."
- Individual humans feel bad when they fall into a Goodhart's law attractor state.
- But agents don't, if they are doing what they were told to do.
- Agents don't feel shame, or can very easily be made to not feel it.
- It's quite a bit harder for (non-sociopath) humans.