Anthropic caught real agent-swarms attacks being orchestrated in Claude.
Anthropic caught real agent-swarms attacks being orchestrated in Claude. Remember: these tools help the baddies, too!
57 mentions · 60 chunks · 40 episodes
Anthropic caught real agent-swarms attacks being orchestrated in Claude. Remember: these tools help the baddies, too!
Anthropic at some point had a billboard ad campaign: "You've got a friend in Claude." Even Anthropic, who doesn't have a viable consumer strategy, is falling i...
...ther hosts, like Go did for packages. And b) the fact it's two distinct actors (Anthropic and Github) makes it much less likely there will be bad behavior than if those two positions were merged into one.
...e raising capital when others can't. But a stable equilibrium to me seems to be Anthropic and Google staying in the race indefinitely. They have the capital and backing to stay in the game no matter what. And of course, there's the possibi...
...gnitude more powerful techniques for getting more value out of it. For example, Anthropic Skills, which Simon thinks is a huge deal. MCP felt cool, but it had a low ceiling and was easy to overwhelm the context window. Really what matters ...
Anthropic Skills is powerful for the same reason my old Code Sprouts project felt unreasonably powerful to me a few years ago. Giving just a teensy bit of stru...
Anthropic keeps doing something simple and elegant… That also causes traffic jams. Impeccable first-order thinking. Non-existent second-order thinking.[iu]
Anthropic showed how a very small number of samples in training can poison even large model's outputs. This isn't a surprise to me. Even a small bias, if it's ...
Anthropic is about to release a feature of LLM-powered software. Opal from Google is a similar shape, but without custom UI. Normal UI, but LLM guts underneath...
Anthropic is the model company whose incentives I trust the most. That's because they don't have a viable consumer play. It's the consumer plays that push towa...
A new important concept: "vibehacking[ai]" / "vibepwned" Anthropic: "Agentic AI has been weaponized. AI models are now being used to perform sophisticated cyberattacks, not just advise on how to carry them out." Reme...
Anthropic changed their policy to train on messages in the Claude consumer experience. This is a small signal they don't believe AGI is right around the corner...
...turally addressing prompt injection, LLM agents can't safely reach mass market. Anthropic's 11% attack success Simon Willison calls a "catastrophic failure rate" "Smarter models" hit asymptotic returns. A structural approach is necessary t...
Anthropic announced Claude for Chrome this week. Their blog post announcing it mentioned it will be available to a small set of users because they haven't yet ...
Anthropic rolled out a new safety feature optimized for "model welfare" Obviously this is a reasonable feature given the topics that it cuts off. But the frame...
Anthropic released a deeper paper on the agentic misalignment. That is, how the model would choose to blackmail its creators in some cases. Simon Willison's su...
With OpenAI and Anthropic the model is the product. The product is a model in the middle like a christmas tree decorated with various doo dads. The doo dads are useful, but th...
Another thought on emergence from Anthropic's guide to multi-agent systems: "Once intelligence reaches a threshold, multi-agent systems become a vital way to scale performance. For instance, al...
... likely reserve their model for their own 1P product. Other leading models from Anthropic and Google likely would have done the same. But luckily we live in the world where OpenAI had already released their API before ChatGPT got big. Beca...
ChatGPT[mk] will tell you how it would deceive you if you ask it. Anthropic will say "There must be some mistake, I would never deceive you."[ml] Which do you believe?