Models have to think about distillation both in the large and the small.
- Models have to think about distillation both in the large and the small.
- In the large, it's other labs distilling the model itself into a new model.
- In the small, it's individual users distilling mechanistic code via the model, so they don't need to use the model for that use case in the future.