This week in the wild west roundup.
- This week in the wild west roundup.
- HashJack is a new indirect prompt injection technique.
- It takes advantage of the fact that the content after a hashtag in a URL won't lead to errors if it's in a structure the page can't interpret… but the LLM can see it just fine.
- A natural place to inject malicious prompt injection instructions!
- I'm disappointed… Google normally has one of the best security teams in the industry.
- How did they let this go out the door?
- A universal AI jailbreak: make the prompts poems.
- This just drives home that "make the LLM not get tricked" is a dead end.