In a graph of code nodes, some code runs together almost all of the time.
Those sub-groups can be cached together as a unit.
Not too dissimilar to "Neurons that fire together wire together"
In a graph of pure function invocations, there's a lot of flexibility to fluidly optimize and find the right level of caching for the actual call patterns.