A model being trained on data or using RAG at inference time has wildly different characteristics.
But a lot of discourse about LLMs doesn't differentiate the two.
There's a difference between an LLM in training absorbing a hologram of the knowledge vs RAG to help sift through concrete input with its background common sense it absorbed in training.
Sometimes you just need its background worldly knowledge to give it common sense.
If you want details, that's not sufficient and you''l need RAG.
Adding more knowledge to a model is expensive, has long lead times, works on vibes and is imprecise.
RAG can't give huge context to a model that doesn't have the right background knowledge, but it can be updated quickly and can enable precision in details.
Everyone talks about these things like they're the same, but they're wildly different.
Training your own model is very capital intensive.
But in many cases you can use an off-the-shelf LLM plus RAG and produce amazing results.
The question is: how much background knowledge do you need for the LLM to have enough common sense to be able to tackle your concrete tasks where you bring the specific details for it to operate on.