Distillation is easier than training.
- Distillation is easier than training.
- LLM output is better regularized than normal text so it's easier to train on.
- The LLM generated text is effectively predigested[zh].
- There's the danger of collapse if you want to create a larger model than what you distilled from, but if you want a smaller model you don't have that risk and it's way easier.