LLMs are vulnerable to the "screenshot attack".

2024-02-20 · Bits and Bobs 2/20/24

That is, if they say something offensive or wrong the user can take a screenshot and it can go viral, eroding trust in the system.

But it is not a given that it has to work this way!

Consider a Google Search result where one of the ten blue links is clearly not a good result.

It's not nearly as viral, Google isn't vouching for it as strongly.

Going even further, imagine someone opening up Word and writing something dumb or offensive and screenshotting it.

It couldn't possibly go viral, because the person clearly put it there themselves!

How viral it is is how much a result could negatively surprise the user, and how much the result is "vouched" for by the service.

In some ways, LLMs today are ambassadors of their creators; anything they say is liable to embarrass their creators, because they have to be the singular, dependable answer.

But if there were a much larger number of different LLMs, each with different personalities, then it wouldn't matter quite as much.

LLMs are vulnerable to the "screenshot attack".

More on this topic