The fact that LLMs still require humans in the loop to get good recurrent results undermines the "AGI is imminent" perspective.
Even if the LLM is right 95% of the time, if it's in a recurrent process where it's feeding on earlier input from itself, that 95% compounds with each iteration.
In 14 iterations, most of the input is junk.
In 90 iterations, all of it is.
LLMs decohere without ground truthing with real world results.
That can be done automatically with things like React components.
But anything that is the least bit complex has to be put out into the world and see how the world reacts.
Complexity can't just be calculated; it's interdependent with the rest of the system.
It has to be integrated with the broader system to be ground truthed.
Getting to 99% quality improves the junk rate at an exponential rate.
But that's a logarithmic quality at an exponential cost rate.