Big enough deep learning models learn to do Stochastic Gradient Descent at inference time, emergently.
- Big enough deep learning models learn to do Stochastic Gradient Descent at inference time, emergently.
- If it's big enough, it will grow to have a model of the universe inside itself.