Why does emulation require X amount of cpu to for accuracy?

390 viewsOtherTechnology

This is something I am having trouble wrapping my head around. Say for example PCEm. It can emulate up to Pentium II, however Pentium III is nearly impossible due to current hardware restraints. However, a Pentium III is 433mhz (if I remember correctly) and modern CPUs are well into 5ghz range. However, to accurately emulate a 433mhz you need x amount of CPU.

Why is that the case? If the CPU you’re using to perform the emulation is vastly more powerful?

I read it’s same even for the Super Nintendo, it ran 5mhz, and for accurate emulation you’d need 3Ghz (which is around today, but wind back a few years ago it would the the same question).

Hopefully it makes sense, I am still trying to understand emulation on a deeper level. Happy to have any links to any docs that answer this question as well.

In: Technology

10 Answers

Anonymous 0 Comments

An SNES or even a Pentium III PC isn’t just one processor.

If I remember right, the SNES also had a Picture Processing Unit and an Audio Processing Unit (at least, the NES had these). Those were two *additional* processors that worked in tandem with the CPU. **And** a cartridge could optionally include an expansion chip like the SuperFX or whatever the Capcom one was. That adds a FOURTH processor to the mix.

While the CPU of the SNES was clocked at roughly 1.8 Mhz to about 3.5 Mhz depending on context. I’m not sure about the APU and PPU, but the SuperFX could be clocked up to 10Mhz as well. Still, most of our modern CPUs are operating 4,000-8,000x faster than these chips. What takes up all the extra effort?

Well, you can’t just tell the SNES CPU “please fetch this value from RAM, that value from RAM, then execute the “add” instruction and write the result to RAM”. You have to:

1. Wait a certain number of CPU cycles after executing the instruction to update the RAM simulation.
2. Wait another certain number of CPU cycles to put the RAM value in the correct register of your CPU simulation.
3. Wait another certain number of CPU cycles to let the operation be “completed”.
4. Meanwhile do whatever the PPU is supposed to be doing
5. Meanwhile do whatever the APU is supposed to be doing
6. Meanwhile be aware of what scanline a TV might be rendering at this moment so any side effects of the current instruction can have the appropriate affect.

That’s a TON of extra work. The SNES CPU didn’t have to do that work. The reason it took so many cycles to do something simple was Physics. Electricity could move through and operate `X` number of transistor-based gates per clock cycle. Doing steps 1-3 above might require operating `6X` gates. So the fastest the SNES CPU could execute that instruction is 6 clock cycles.

All of that has to get simulated on YOUR CPU. Steps 1-3 above might involve having to execute 200-300 instructions on your CPU to make sure everything gets done right in the simulated system. It still has some `Y` number of gates that can operate per clock cycle, and each instruction takes some `M * Y` number of cycles to operate. In that way we can see that emulating a system can easily incur 1000x more work than the original system was doing.

All of that is required for accuracy. That accuracy can really matter. I had a friend who was writing an NES emulator once, and he had trouble loading some game, I think Donkey Kong. What he figured out is he had the timing in his PPU wrong, so it was trying to draw things to the screen before the code was finished updating the memory the PPU uses to draw. So he was drawing “too early”.

I hear you. “Hey, the NES was clocked at 1.76Mhz but was nowhere near as hard to emulate as SNES.” That’s correct, but the NES was much more simple. To oversimplify, you put the tiles for its graphics in memory, you used numbers in other memory to tell it which tiles to put where, and that was that. SNES’s hardware could do very complicated graphics operations that require a lot more work. For example, the famous “Mode 7” effect works kind of like:

* Stretch the image by this much and render one scanline.
* Now stretch the image a little less and render one scanline.
* Now stretch the image a little less…

That means your PPU code can’t just be copying graphics from memory to the screen. You need to pay attention to **dozens** of other parts of the system which may be telling you to scale, rotate, tint, or do a lot of other things to the image that’s in memory. And you have to sync that up with a virtual CRT television, because many games would happily update memory that had “already been rendered”.

The beefier the system gets, the less it feels like a toy calculator and the more it feels like an extremely complex factory. The NES CPU had roughly 5,000 transistors in it. That’s small enough I’ve seen people build one themselves out of buyable parts, and you could write a Physics simulation that simulates every single transistor that runs well on modern systems. A Pentium III had 9.5 MILLION transistors. That makes it at least 2,000x more complicated to emulate accurately.

And you’re still going to have to emulate CMOS, a Northbridge, a Southbridge, RAM timings, an audio card, a video card…

You are viewing 1 out of 10 answers, click here to view all answers.