Would having an expandable RAM help in running non training workloads like inference for LLMs / Stable Diffusion models?
For the most part it would just make things worse and only help in a very narrow set of circumstances
Graphics memory is running at extremely high speed and is very sensitive to latency.
Not enough memory or memory too slow for the GPU and performance is significantly reduced.
Faster memory or more memory? Doesn’t really matter because you’re limited by the processing
There are some pro level cards with crazy amounts if memory of your data set requires it, but then you’re not worried about cost or graphics. Nvidia sells GPU Accelerator cards with up to 80 GB of memory and a 5120 bit wide connection to get some crazy memory speeds out of HBM but it needs to be right next to the core for those speeds
One of the problems not mentioned yet is that graphic cards use more modern chips which are not being sold as RAM modules. So this would require creating these modules which generally only few users would buy.
If you would add normal RAM modules it would mean slowing down the whole VRAM which is a pitfall nobody wants to have. (for more reasons than just slowing)
e.g. 4090 uses GDDR6x while the top memory moduels for sale are DDR5.
Tl;Dr they aren’t built with extra memory lines (wires) to empty slots because this would be overhead on the GPU side, and the signal timing requirements are extremely tight for GPUs so there is no margin for uncertainty due to the slot etc. The VRAM needs to be physically close to the GPU and tightly connected.