Another puzzle of LLMs: they're surprisingly bad at generating very large legal JSON blobs.
They'll often miss a comma or a } or ].
This breaks our mental model; they're so good at generating things that match even subtle patterns, and this is something a simple pushdown automata could handle!
LLMs are not doing reasoning or computation, they're doing extremely good vibes matching of things they've come across before.
There's so much JSON in the training, that especially for small blobs of JSON there's tons of examples of every permutation of nesting of objects and arrays, and so its intuition / vibe matching is very resilient.
But the larger the JSON blob gets and the deeper the nesting, the fewer direct examples of that exact nesting structure there is.
This must be so for structural reasons, as with each step the combinatorial space grows.
Something something Assembly Theory.
LLMs have to have their attention layer tuned from examples to know which characters to attend to.
They aren't "counting" the parentheses, they're pattern matching based on things their attention mechanism tells them are relevant for situations like this... and those are tuned based on the stuff it's seen in training data.
So we get confused that they get confused because they aren't doing basic computation, they're doing vibes matching on a mind-numbingly large dataset that gives them resilient coverage of any smallish JSON shape.