Prompt injection is very unlikely to be solved by the model simply getting so good it can't be tricked.
- Prompt injection is very unlikely to be solved by the model simply getting so good it can't be tricked.
- This is evident in the model card for GPT5.
- A lot of AI people are (implicitly, perhaps unintentionally) making the bet that models will get good enough to make security concerns moot.
- This is less because they have a good understanding of what the models will be able to do and more because they have a poor understanding of security.
- It's also a smuggled infinity.
- "If only models were perfect at not being tricked this would be safe."
- Remember, the threat coevolves; as these tools are more used, the incentive for the threat gets larger.