This is something I am having trouble wrapping my head around. Say for example PCEm. It can emulate up to Pentium II, however Pentium III is nearly impossible due to current hardware restraints. However, a Pentium III is 433mhz (if I remember correctly) and modern CPUs are well into 5ghz range. However, to accurately emulate a 433mhz you need x amount of CPU.
Why is that the case? If the CPU you’re using to perform the emulation is vastly more powerful?
I read it’s same even for the Super Nintendo, it ran 5mhz, and for accurate emulation you’d need 3Ghz (which is around today, but wind back a few years ago it would the the same question).
Hopefully it makes sense, I am still trying to understand emulation on a deeper level. Happy to have any links to any docs that answer this question as well.
In: Technology
Not every CPU has the same instruction set. Or in other words, different CPUs use a different language to speak. While you can solve most any problem on any CPU, some will be able to achieve it with fewer steps.
To make sure timing is correct you really want a CPU that can do multiple instructions per “expected clock cycle” to ensure that all instructions are done in the correct order in the time allotted.
Emulation is hardware (your computer) running software (the emulator) pretending to be hardware (game console or whatever), which is running more software (the game or program you’re using emulation to run).
Even though the program you’re trying to run in the emulator might just be trying to run a single instruction, different hardware might require different instructions and a completely different order of running them, meaning any sort of translation between the real hardware and emulated hardware is going to require a lot more processing than just running that single instruction, and even then it’s going to run much slower due to it running as software and not native hardware.
Particularly for SNES it’s about accuracy. If the program wants to calculate 123+456 then you could
* Return 579.
* Return 579 in exactly 6 cycles, otherwise it’s too fast.
* Return 444 because the second operand was about to be changed and the original hardware fetched in sequence so it always received the updated value while your new CPU is pipelined to fetch both values at once.
A lot of software can be faithfully emulated simplistically, but some features can intentionally or unintentionally rely on very specific hardware interactions, and that’s way harder to emulate.
Imagine someone is writing a script in an old dead language like ancient Greek. The ancient Greek would be the original console, that’s what they speak and write.
However now it’s been like a thousand years and even though us English speakers can read/write that to a degree, it’s a bit harder for us because there’s nuances to the older language that aren’t easily compatible.
Emulators have an extra “layer of translation” so to speak to communicate with the Greeks.
As others have pointed out, emulation is hardware running software pretending to be hardware running software.
It’s not very efficient to do, which is why it costs extra computing power.
Also, instruction sets are not the same. The commands that the software inside the emulator gives to the pretend-CPU its running on might not be available on the actual physical CPU doing the work.
That means that the emulation software needs to do a translation, which again costs computing power.
Emulating the rest of the hardware also needs translation. For instance, the sound chip of a SNES works way different than that of the PC running the emulation.
As for emulating a Pentium2 or Pentium3 (up to 1.4 Ghz btw, not 433Mhz) machine, it should be relatively easy since modern x86-64 CPU’s (like AMD Ryzen and Intel Core) still support most (if not all) the instructions older Intel and AMD CPU’s supported.
The emulation software should be able to just pass through commands instead of needing to translate.
Maybe it just doesn’t, or it emulates some custom graphicscard from way back when that needs translation for it to work on a modern GPU.
Youre playing the game [Simon Says](https://spectrum.ieee.org/media-library/photo-of-the-box-of-an-electronic-game-that-says-simon-and-shows-two-hands-pressing-colorful-buttons-on-a-round-object.png?id=50539567&width=400&height=250) with 3 friends, and you’re partners with each other, each team has their own game hardware. One player on each team is tasked with shouting instructions to their teammate, and whoever can hit the buttons the quickest the most times in a row wins.
The only problem? The host team both are native English speakers. You, however, are playing with a foreign exchange student who does not know a single word of English, only Spanish. So, now you have to not just be able to communicate in a new language by looking up the new word every time (imagine you don’t have the short term memory to remember them, you only can remember English), but also have to try to do it as quick and accurate as the other team to even try to spit out the same pattern close to theirs in rhythm.
Now, multiply this problem by a few million times, and this is an oversimplified way of thinking of the issues with translating instructions in real time between two totally different architectures. Now, realize that most consoles since the 16 bit era have separate teams working to output video, sound, and then for the 32 bit era and later, sub teams which only handle things like the math required to figure out where the corners and edges of a 3D thing is, then how the surface of it should look from the angle you’re looking (a floating point unit). So you hop on over to the room with the “video” team, only to realize the way the room is physically organized is totally different, lets say the guy that’s really good at doing the math for the 3D object size calculates them as hypothetical square objects to save time, not triangles like you’re used to. And on top of them speaking a different language inside of just that room compared to what your video guys speak. So now, that’s a whole new set of issues to worry about when they pass stuff off to the other teams to work in concert.
If you managed to write a dictionary to speed lookup of the translations, that’ll speed things up, but it will never be quite be with the exact same timing as the native English speaking team. Well, so you decide to find dedicated translators for the teams, but now that’s doubled your team size, and requires a ton more overhead even still. Actually exponential compared to what you’d think, because you have to then have translators between the Audio/Video/sound teams now too.
Eventually, you iron out all of these things, but you realize that the original Simon Says teams that all speak the same language have only a few dozen team members and can whip out instructions smoothly without delay, meanwhile you have 1000x the total people they do to be able to relay things between rooms as quickly and accurately as you can. Finally, you match them a long time later after you’ve worked out all of the kinks.
An SNES or even a Pentium III PC isn’t just one processor.
If I remember right, the SNES also had a Picture Processing Unit and an Audio Processing Unit (at least, the NES had these). Those were two *additional* processors that worked in tandem with the CPU. **And** a cartridge could optionally include an expansion chip like the SuperFX or whatever the Capcom one was. That adds a FOURTH processor to the mix.
While the CPU of the SNES was clocked at roughly 1.8 Mhz to about 3.5 Mhz depending on context. I’m not sure about the APU and PPU, but the SuperFX could be clocked up to 10Mhz as well. Still, most of our modern CPUs are operating 4,000-8,000x faster than these chips. What takes up all the extra effort?
Well, you can’t just tell the SNES CPU “please fetch this value from RAM, that value from RAM, then execute the “add” instruction and write the result to RAM”. You have to:
1. Wait a certain number of CPU cycles after executing the instruction to update the RAM simulation.
2. Wait another certain number of CPU cycles to put the RAM value in the correct register of your CPU simulation.
3. Wait another certain number of CPU cycles to let the operation be “completed”.
4. Meanwhile do whatever the PPU is supposed to be doing
5. Meanwhile do whatever the APU is supposed to be doing
6. Meanwhile be aware of what scanline a TV might be rendering at this moment so any side effects of the current instruction can have the appropriate affect.
That’s a TON of extra work. The SNES CPU didn’t have to do that work. The reason it took so many cycles to do something simple was Physics. Electricity could move through and operate `X` number of transistor-based gates per clock cycle. Doing steps 1-3 above might require operating `6X` gates. So the fastest the SNES CPU could execute that instruction is 6 clock cycles.
All of that has to get simulated on YOUR CPU. Steps 1-3 above might involve having to execute 200-300 instructions on your CPU to make sure everything gets done right in the simulated system. It still has some `Y` number of gates that can operate per clock cycle, and each instruction takes some `M * Y` number of cycles to operate. In that way we can see that emulating a system can easily incur 1000x more work than the original system was doing.
All of that is required for accuracy. That accuracy can really matter. I had a friend who was writing an NES emulator once, and he had trouble loading some game, I think Donkey Kong. What he figured out is he had the timing in his PPU wrong, so it was trying to draw things to the screen before the code was finished updating the memory the PPU uses to draw. So he was drawing “too early”.
I hear you. “Hey, the NES was clocked at 1.76Mhz but was nowhere near as hard to emulate as SNES.” That’s correct, but the NES was much more simple. To oversimplify, you put the tiles for its graphics in memory, you used numbers in other memory to tell it which tiles to put where, and that was that. SNES’s hardware could do very complicated graphics operations that require a lot more work. For example, the famous “Mode 7” effect works kind of like:
* Stretch the image by this much and render one scanline.
* Now stretch the image a little less and render one scanline.
* Now stretch the image a little less…
That means your PPU code can’t just be copying graphics from memory to the screen. You need to pay attention to **dozens** of other parts of the system which may be telling you to scale, rotate, tint, or do a lot of other things to the image that’s in memory. And you have to sync that up with a virtual CRT television, because many games would happily update memory that had “already been rendered”.
The beefier the system gets, the less it feels like a toy calculator and the more it feels like an extremely complex factory. The NES CPU had roughly 5,000 transistors in it. That’s small enough I’ve seen people build one themselves out of buyable parts, and you could write a Physics simulation that simulates every single transistor that runs well on modern systems. A Pentium III had 9.5 MILLION transistors. That makes it at least 2,000x more complicated to emulate accurately.
And you’re still going to have to emulate CMOS, a Northbridge, a Southbridge, RAM timings, an audio card, a video card…
You’re probably taking about cycle accurate emulation. Almost all SNES games could be emulated just fine back in the 90s on CPUs just a few hundred MHz.
However they’d be a few games that weren’t perfect. The SNES isn’t just a CPU it has a bunch of other chips handing graphics and sound and copying memory around. All those would happen in a very particular order in a real SNES, and developers would know that so they could write code that works in the order things are happening on the SNES’ chips.
So if you want to write a perfect emulator you don’t just go thought the CPU instructions converting them you need to go “ok now there is a CPU cycle so I read one instruction, now there is a DMA cycle, now the audio chip reads one byte of memory, now the graphic chip reads 2 bytes of VRAM, now the CPU has finished processing the instruction and writes 1 byte to the graphic chip’s register” and so on.
It just makes a lot more work for the emulating computer.
Before you understand emulation you need to understand translation. Which is quite simple.
Take for example arm and x86 on old and new mac. Their cpu dosnt speak the same language. So the cpu will have to translate the old x86 instruction to the new arm instruction.
So imagine the CPU as an English guy who is given a Japanese book and is translating it word for word and doing what the book says.
He sees the phrase “手を挙げて”, translates it to “raise your hand” and then he raises his hand.
So far so good.
Next he sees “ヒナから手紙を受け取って”, translate it to “recive letter from Hina”, then… wait who is Hina? What does the letter say?
Translation no longer works.
So instead of just translating the instruction he needs to pretend to be Hina and write that letter (in japanese), then seal it, then deliver it to himself.. This is emulation.
Emulation is much more work because he (the cpu) is doing the work of multiple people (the device) all by himself in a language he dosnt understand. Just to make that instruction make sense.
So if you want the emulation to work in real time like for a game, the cpu needs to be much more powerful than the cpu of the emulated device. Since that cpu isn’t just translating and performing the instruction, but also the interaction between all the other devices connected to it.
If your modern computer could just do what the old ones you are emulating did then you wouldn’t need to emulate them. They had specific things they were built to do and it is much less efficient for your modern system to do those specific tasks. Therefore you need more clock cycles (time/steps) to do it.
Latest Answers