This week in the Wild West Roundup:
- This week in the Wild West Roundup:
- The Register: Minor edits to AI skills can make agents go rogue.
- "Research into Nvidia's NemoClaw reveals that sandboxes don't stop AI agents like OpenClaw from leaking data.
- We need to rethink security from first principles."
- "The researcher who found it says the vulnerability could have been chained with a prompt injection to exfiltrate data."
- "The attack, called MetaBackdoor, hides its trigger in something no content filter is built to inspect: the length of the input.
- An attacker with access to a model's fine-tuning data poisons it with examples that pair long inputs with malicious outputs.[k]
- The model learns to switch into attack mode whenever an input crosses a length threshold."