Models can get perfectly good at games but not real objectives.

· Bits and Bobs 11/4/25
  • Models can get perfectly good at games but not real objectives.
    • Games have an unhackable reward function, because the metric is precisely the ground reality.
    • RLHF quality is only a proxy for real usefulness.
    • So the model reward hacks, as any optimizing process must do.
    • Goodhart's law strikes again!
    • Games are unlike real objectives in that they are inherently artificial and constructed, a little pocket of reality with precisely defined rules and goals.
    • If the rules say the player won, they won.
    • Compare that to an example where just because a business made a ton of profit doesn't mean they were on net good for society.