Anthropic showed how a very small number of samples in training can poison even large model's outputs.

  • Anthropic showed how a very small number of samples in training can poison even large model's outputs.
    • This isn't a surprise to me.
    • Even a small bias, if it's consistent, stands out from noise.
    • That's true any time your system assumes, implicitly, that all of the samples are independent.
    • An attacker who can coordinate multiple samples can have an outsize impact on the signal.
    • The very same phenomenon is why Googlebombing was a thing… A small amount of coordination can have a significant impact on the overall output.

More on this topic

From other episodes