Most of the world isn't legible to computers.
Humans can locomote themselves to physical locations in the world and look / hear / touch.
But computers are by default blind and deaf and have to have special eyes and ears positioned and connected in the world.
A lot of problems get way harder if you say "imagine you can't see anything, how would you do X task".
A lot of things that are easy for humans are hard for computers not just because of reasoning missing, but also sensing.
Reasoning is easy now thanks to LLMs, so real-world sensing is the long pole.
Even if there physically is a camera in the location, the idea of connecting it to a system that can always watch it and take actions is potentially terrifying.