Anthropic showed how a very small number of samples in training can poison even large model's outputs.
- Anthropic showed how a very small number of samples in training can poison even large model's outputs.
- This isn't a surprise to me.
- Even a small bias, if it's consistent, stands out from noise.
- That's true any time your system assumes, implicitly, that all of the samples are independent.
- An attacker who can coordinate multiple samples can have an outsize impact on the signal.
- The very same phenomenon is why Googlebombing was a thing… A small amount of coordination can have a significant impact on the overall output.