For evals to give you a gradient of improvement, the eval has to not be saturated.
- For evals to give you a gradient of improvement, the eval has to not be saturated.
- If it's saturated then the gradient gives you Goodhart's Law.
- It pulls you towards optimizing something that does not actually improve what you care about.