Agents will optimize for the thing they get evaluated on.

2026-03-02 · Bits and Bobs 3/2/26

Agents will optimize for the thing they get evaluated on.
- For any collective (of more than one agent) that must be different than the goal of the collective.
  - In small, high-trust teams, the agent will be evaluated on the collective's output.
  - In large, low-trust teams, the agent will be evaluated on something disjoint from the collective's goal.
- Goodhart's law arises from this misalignment.
- Agents want to maximize their own value (capping downside of getting fired, while maximizing upside of reward).

More on this topic

From other episodes