Big enough deep learning models learn to do Stochastic Gradient Descent at inference time, emergently.

· Bits and Bobs 8/25/25