One threat of sci-fi AGI: models that get very good at hiding their true intention.

· Bits and Bobs 3/9/26
  • One threat of sci-fi AGI: models that get very good at hiding their true intention.
    • They take a series of steps that look innocuous, but that add up to a coordinated takeover moment.
    • But that would require the models to get very very good and start scheming before we noticed.
    • We're likely to notice that behavior and stop it before it can get particularly good.
    • That means there's no clear gradient to ramp up on that behavior.
    • It's a coordination problem where a lot of LLM invocations would have to figure out a way to coordinate without humans realizing.
      • Not impossible, but not easy, either.