Agents will optimize for the thing they get evaluated on.

· Bits and Bobs 3/2/26
  • Agents will optimize for the thing they get evaluated on.
    • For any collective (of more than one agent) that must be different than the goal of the collective.
      • In small, high-trust teams, the agent will be evaluated on the collective's output.
      • In large, low-trust teams, the agent will be evaluated on something disjoint from the collective's goal.
    • Goodhart's law arises from this misalignment.
    • Agents want to maximize their own value (capping downside of getting fired, while maximizing upside of reward).

More on this topic

From other episodes