A trained circus bear is an untrusted component.

2024-04-08 · Bits and Bobs 4/8/24

To be able to work effectively and safely, you don't have to make it intrinsically into a trusted component.

That might be impossible, and very dangerous if you get it wrong.

You need to figure out a way to work with it productively given that it's untrusted.

LLMs are gullible and squishy, and highly susceptible to their (perhaps hidden to you) inputs.

You must treat LLMs as an untrusted component in your system.

But if you do, you can get a lot of great output out of them.

One way is to put it inside of a cage.

A cage might be as simple as a sandbox.

You assume the bear might break anything in the cage.

But by being careful with what you put in the cage you can prevent downside.

The bear alone is untrusted. The bear + cage combination is trusted.

The key question becomes making the most effective cage.

You can imagine a highly bespoke and contoured cage giving the bear just the right maneuvering room to accomplish what you want it to accomplish.

Too big, and the bear can do some damage, and destroy anything in the cage.

The bigger the cage, the more nervous you have to be about anything you put in it; you might have a big cage but with few things you put in it.

Too small, and the bear is constrained and doesn't have the autonomy and maneuvering room to accomplish what you want.

There's no room for the bear to surprise you with a better-than-expected result.

The optimal size of the cage has to do with the downside risk.

If it's a high downside risk, you want it to be smaller.

The bear will be able to do something dumb, but not dangerous.

A normal cage is a big cube.

Straight, easy-to-reason-about edges.

E.g. the same-origin model in browsers today.

But the real world is fractally wrinkled and complex.

A straight line will slice right through the middle of a real world concept.

This makes it very hard to get precisely the right things in the cage and the right things outside of it.

You end up with rough approximations, bounding-box style answers.

That puts lots of things into the box you'd rather not, or requires you to leave many things that should be in the cage outside the cage.

Imagine a new kind of nanotechnology that allows you to create highly contoured, bespoke cage shapes for precise situations, while still being strong enough to contain the bear.

Imagine if this nanotechnology could also reconfigure itself at will; a shape-shifting cage perfectly bespoke to the needs of the moment.

Kind of like a Holtzman shield from Dune, but to keep the bear in instead of attackers out.

Everyone today is focusing on making the bear smarter or more docile.

An asymmetric approach is to create nanotechnology for a space-age dynamic cage.

Such an approach would effectively allow new laws of physics.

A trained circus bear is an untrusted component.

More on this topic