LLMs don't do a good job with negative space.

2024-06-24 · Bits and Bobs 6/24/24

A friend was trying to generate a picture for a game he was a dungeon master for.

He asked a generative AI to make a picture of a secret lair in a swamp.

The picture that came back was perfect, except there was a small sign that said "Secret Lair".

He tried again, appending to his prompt: "There should be no signs whatsoever".

The next image had a larger sign.

This continued, escalating until the generated image had a flashing marquee sign pointing to the secret lair.

This pattern will be familiar to anyone who has wrestled with generative image models to remove some detail.

All of the training data has descriptions of images as they actually are.

In that case, why would you describe what's not in the image? You can just describe what is in the image.

But a description of an existing picture, and a description to create a picture are different.

In the former, you would never use a negative word like "without".

In the latter, you might, if the baseline understanding of the artist might guess that a certain detail should be included.

The generative model isn't used to seeing negative/"without" words, so they're effectively background noise to it that it ignores.

As you get increasingly emphatic about what to remove, it still doesn't sense the negative word, so all it sees is "wow, he seems really emphatic about this secret lair sign, I guess it should be really emphasized!"

LLMs don't do a good job with negative space.

More on this topic