Models have to think about distillation both in the large and the small.

2026-04-13 · Bits and Bobs 4/13/26

Models have to think about distillation both in the large and the small.
- In the large, it's other labs distilling the model itself into a new model.
- In the small, it's individual users distilling mechanistic code via the model, so they don't need to use the model for that use case in the future.