Mixture of experts intuitively makes sense as a technique.
Ask the LLM to generate three very different results.
Then synthesize the best parts of all of the different results.
LLMs are at their best looking backward (synthesizing, curating) than they are creating ("The answer to your question is…" and then YOLOing the answer on the spot, without any planning).
LLMs are great at applying good judgment to do high-quality synthesis; by being able to see which answers hit on good ideas and which ones didn't they can synthesize just the best parts.
Another reason the "chain of thought" technique works well.
Have it unspool the reasoning and then synthesize what the answer is.
Instead of the default where it YOLOs an answer and then retcons a reason.