Prompt injection is hard to combat because normal sandboxing doesn't work without a million permission dialogs.

· Bits and Bobs 5/5/25
  • Prompt injection is hard to combat because normal sandboxing doesn't work without a million permission dialogs.
    • The stuff you'd use to contain the prompt injection (LLMs) is the stuff that can be tricked by anything you show.
      • Turtles all the way down.
    • "Do you trust this domain to get information from this chat?"
    • This would be a huge number of permission dialogs.
    • Could you imagine if the web had a permission dialog for every third party domain being requested on the web?
      • It would be overwhelming.
      • The web doesn't do it because it doesn't allow sensitive data (only data the user trusted the origin to have access to).
      • The origin might trust more third parties than the user realizes, but technically users are delegating it to the origin, or the employees who can ship code for that origin.
      • But LLMs can't make trust decisions to delegate because they are inherently confusable.
      • Presumably everyone shipping trusted code for an origin is a professional who is weighing the security risks.
      • Not true for an LLM.

More on this topic

From other episodes