Swarms of agents will have even stronger Goodhart's law than swarms of humans.

· Bits and Bobs 2/16/26
  • Swarms of agents will have even stronger Goodhart's law than swarms of humans.
    • Agents are told to optimize one thing, and by god they will do it.
    • Humans will think, "well I have a reputation to uphold" or "this game is an iterated one, so I should balance short-term and long-term."
    • Individual humans feel bad when they fall into a Goodhart's law attractor state.
    • But agents don't, if they are doing what they were told to do.
    • Agents don't feel shame, or can very easily be made to not feel it.
    • It's quite a bit harder for (non-sociopath) humans.