A good tip via Simon Willison: use the weaker models when iterating on prompts.

If you can get something to work in last-gen models (e.g. GPT 3.5 Turbo, or Claude Haiku) then it will definitely work / work better in newer models.

But if you do it the reverse, you might find that it's only viable in the expensive model.

The n-1 model will always be significantly cheaper than the frontier models.

More on this topic