One threat of sci-fi AGI: models that get very good at hiding their true intention.
- One threat of sci-fi AGI: models that get very good at hiding their true intention.
- They take a series of steps that look innocuous, but that add up to a coordinated takeover moment.
- But that would require the models to get very very good and start scheming before we noticed.
- We're likely to notice that behavior and stop it before it can get particularly good.
- That means there's no clear gradient to ramp up on that behavior.
- It's a coordination problem where a lot of LLM invocations would have to figure out a way to coordinate without humans realizing.
- Not impossible, but not easy, either.