Bruce Schneier on prompt injection: "We need some new fundamental science of LLMs before we can solve this."
Bruce Schneier on prompt injection: "We need some new fundamental science of LLMs before we can solve this."
84 chunks · 50 episodes
Bruce Schneier on prompt injection: "We need some new fundamental science of LLMs before we can solve this."
We spent decades making injection attacks invisible to developers. Modern frameworks auto-escape HTML. ORMs parameterize queries. Follow standard practices and you don't have to think about it. Now LLMs make all text executable. Frameworks don't help. Everything is code. XSS has a solution: we can p
Anthropic announced Claude for Chrome this week. Their blog post announcing it mentioned it will be available to a small set of users because they haven't yet made it safe enough. They shared their stat of attack success rate: 11.1%. It's multiple orders of magnitude too high to be safe for mass mar
A lot of absurd solutions hide behind an implicit "once the LLM is perfectly good". "Perfect" is a smuggled infinity.[bc] Once you introduce an infinity into an argument, everything downstream is absurd, because anything other than zero multiplied by infinity is infinity. "Prompt injection won't be
This week in the "wild west roundup" Simon Willison's roundup of prompt injection attacks this summer A prompt injection technique that hides malicious text in images. Engadget: AI browsers may be the best thing that ever happened to scam...
Someone peeked inside of Claude Code's workings and saw tons of "<system-reminder>" instructions, keeping it convergent and on track. That technique could also be used by prompt injection!
Chat is a gap filler UX modality. I want a system that can create malleable chatbots. That can spin them up on demand with different personalities. Bonus points if it can safely use tools without the risk of prompt injection.
This week in "we're in the wild west era" "Sloppy AI defenses take cybersecurity back to the 1990s, researchers say" "GPT-4o still outperforms GPT-5 on hardened [security] benchmarks across the board." "GitHub Copilot RCE Vulnerability via Prompt Injection Leads to Full System Compromise"
This week's round up of "we're in the wild west era with LLMs": A postmortem for a vibecoded tool called DrawAFish that had abuse problems. A Cursor exploit that allows arbitrary remote code execution. AgentFlayer: ChatGPT Connectors 0click Allows exfiltration of sensitive Google Drive docs a user a
Prompt injection is very unlikely to be solved by the model simply getting so good it can't be tricked. This is evident in the model card for GPT5. A lot of AI people are (implicitly, perhaps unintentionally) making the bet that models will get good enough to make security concerns moot. This is les
I see three seeds of massive possibility[dc] in the era of AI, but each currently with a low ceiling. MCP shows the power of integration of data. However, the lethal trifecta sets a low ceiling; the more you integrate with powerful tools, the more dangerous prompt injection gets.[dd][de] Chatbox UX
A prompt injection technique that hides the injection in legal boilerplate in the terms of service. Drafting off the fact that no one reads that anyway. We'll see many other social hacks.
There is no solution to prompt injection in systems where LLMs call the shots. LLMs seeing raw data and being asked to make load-bearing security decisions cannot be made safe, no matter how good the model gets. Even if the model is great, the trolley problem of having the model, not the user, be tr
ChatGPT's Agents feature feels fundamentally reckless to me. Their approach to prompt injection is basically: tell the model to really, really, focus on not doing anything bad. It uses the model as a security boundary, which is reckless even for advanced models. Rolling the feature out widely ups th
This article on on-the-fly toolgen was interesting. But I don't think it goes far enough. It still has the LLM at the root of the loop, calling the shots, deciding what to rely on. But any system with an LLM in the driver's seat is prone to prompt injection. Why not have codegenned code be the root
The McDonalds application AI leaked tons of personal data. The problem wasn't prompt injection per se, it was just a poorly configured and secured system. Still, I imagine we'll see a lot of these kinds of things with companies eager to integrate AI into their publicly-exposed systems.
An LLM can be trusted not to write code to attack you in particular. But if it sees any untrusted context at all the LLM can become malicious. This is why prompt injection is so dangerous.
An in the wild prompt injection attack attempt was discovered.
A report about how prompt injection can easily happen in MCP.
Which will be more important by unit weight in software systems in the AI era, LLMs or normal code? A lot of platforms being built for the age of AI imagine that most of the weight of systems will be LLMs, with just a little bit of code.[gq] What if it's the other way around, and it's mostly code, w