3D models don’t take up much space. Naively storing 100 triangles with each of its 3 corners having 3 coordinates of 4 bytes each, that’s a mere 3.6kb even before you start optimizing.
Same with N64 music. It was not recorded, and instead stored as a set of notes much like a midi file. That’s also just a handful of kb per track.
Meanwhile, low quality Youtube 240p video uses 3000-4000 kb/minute, and that’s with a modern high quality codec that the N64 can’t decode.
Latest Answers