Reward functions work great in systems that have no indirect effects.
- Reward functions work great in systems that have no indirect effects.
- These are closed systems.
- For example, AlphaZero just needed a single reward function based on who won the game.
- But in systems with indirect effects it's impossible to create one true reward function to simply optimize for.
- Systems with indirect effects are nearly every problem that actually matters!
- Closed systems are often toys.