Distillation is easier than training.

· Bits and Bobs 2/18/25
  • Distillation is easier than training.
    • LLM output is better regularized than normal text so it's easier to train on.
    • The LLM generated text is effectively predigested[zh].
    • There's the danger of collapse if you want to create a larger model than what you distilled from, but if you want a smaller model you don't have that risk and it's way easier.