Dear homebrew developers why the original Xbox was able to emulate N64 games flawlessly and a twice faster CPU on a Raspberry Pi 3B/3B+/4 (and far more RAM) still can’t?


Dear homebrew developers why the original Xbox was able to emulate N64 games flawlessly and a twice faster CPU on a Raspberry Pi 3B/3B+/4 (and far more RAM) still can’t?

In: Technology

I am honestly really interested in the answer to this as well, but I doubt this is the sub that your going to get an answer on. I’m not sure what the best sub for this is, maybe some other redditors know better than I do? Either way good question, just maybe not something that the average audience cares about enough to know the answer to. This sub has surprised me before though, maybe it can again?…

It’s not just CPU & ram, it’s the graphics card and software infrastructure.

A video card is basically a second CPU, but specifically designed to do graphical calculations. They are especially important for doing full 3d graphics, which become amazingly slow without them. The Xbox has one while the rasberry pi doesn’t.

Software infrastructure referrers to all the code and resources required to do the stuff that you want the system to do. Historically, consoles provided almost no infrastructure to the developers. Their code almost directly access the hardware. Most rasberry pis are running a full linux distribution, both allowing it to more easily do more things, but consumes a relatively high amount of resources.

Point of order, N64 emulation on XBOX was kinda crap IIRC.

Secondly while the clock speed on the Pi maybe be faster by a good bit, it runs on an ARM64 processor which is designed for power efficiency and cost savings over raw compute power. The XBOX is based on an x86 processor which can effectively do more per clock cycle, at the cost of using note transistors and therefore more power.

Another problem is that ARM64 is a relatively new architecture. N64 emulation was more popular back in the OG XBOX days, and there was more effort put into optimization for that architecture. ARM64 coming up only fairly recently hasn’t had the same amount of optimization/development put into it. Specifically dynarec comes to mind or dynamic recompiling which translates n64 specific code into x86 or ARM64 specific code. It’s entirely possible that whatever emulator you used just didn’t come with this as a feature by default. (Though it’s probably doable with some configuration, seems like someone did manage to write something about 3 years ago.)

Raspberry Pi, even the latest one, isn’t all that powerful. And they get seriously hot, they are still using old world chip designs. Apple’s ARM chips cost more than a whole Pi to even make but they run cool, they run much more powerful, and they sip electricity. Like comparing a 100 year old tractor to a new Tesla.

Mainly because the CPU clockspeed isn’t the only factor to consider.

E.g A bus and a motorbike can both travel at 100kmh but at the end of the journey the bus will have transported 50 times more people and their luggage.

It’s the same with CPU’s. Different designs (essentially known as architecures) can do more than others. They all perform various calculations on data (essentially known as instructions) but some can have more complex instructions to choose from and so can do more ‘work’ in less time.

That’s basically the difference between the Xbox CPU and the Raspberry Pi. It’s the CISC (complex instruction set computer) architecture versus the RISC (reduced instruction set computer) architecture. The Xbox uses a Pentium III (CISC) and the Raspberry Pi 3b uses a Cortex A53 (RISC).

If you want to get a bit more technical we can take the example of multiplying two numbers. One number is stored in memory location A, the other is stored in memory location B, we want to multiply A by B and store the result back in memory location A again.

With the CISC architecture of the Pentium III we just have to tell the CPU to do something like :

multiply a, b

That’s one complex instruction that takes one clock cycle to complete. The Xbox CPU runs at 733MHz so it can do 733 million of those multiplications in one second. (That’s not exactly correct but it’s ok to think of it that way for the sake of the example).

With the RISC architecture of the Cortex A53 we have to tell the CPU to do something like :

load a
load b
product a, b
store a

That’s 4 simple instructions that take 4 clock cycles to complete. If the Raspberry Pi 3b CPU runs at roughly double the clockspeed of the Xbox CPU it would still take twice as long do that multiplication task. (Again that’s not exactly correct but think of it that way for the sake of the example).

It gets even more technical when you take other aspects of the different CPU architectures into consideration. You can get into things like how many instructions can each architecture execute at the same time (known as parallelism). That’s not as simple as how many cores are available, it’s to do with how each core schedules and arranges the instructions it’s working on at any given moment. It does that because instructions and memory accesses take time to complete, and the CPU can be doing other work while it’s waiting for those things to complete. The Pentium III has a superior design in that department compared to the Cortex A53 which further increases the amount of work it can do in each clock cycle.

Or you can talk about the size and performance of the cache memory (very fast memory inside the CPU that stores the most recently used data and instructions). That is used to avoid having to load and store the same data from the relatively slow main memory (the RAM). That also allows work to be done more quickly and the Pentium III is better than the Cortex A53 in that regard too.

So … as you can imagine all that extra complexity in a Pentium III makes it more capable, and it also makes it larger and more expensive to manufacture. But that’s basically why an Xbox CPU is better than a Raspberry Pi CPU even though it’s clockspeed is half as fast, and also why a Raspberry Pi CPU is so much cheaper.