Time and space are very hard to capture in language, which is one-dimensional.
Vision can capture multiple dimensions quickly and intuitively.
You can sense things in your peripheral vision and flit your eyes there and back in an instant.
This is not possible in a single-dimensional sense like hearing.
How weird must it be to experience the real world exclusively through language?
What a weird distillation, what a compressed pipe of information.
That's the way LLMs see the world!