Given a very shoddy program like,
void crapFunc(String str){
buffer [500];
str_cpy(buffer, str);
return;
(sorry for crap code, I’m not too familiar with C or C++)
I learned that if you have a string that goes over the buffer limit and into the stack where the return ptr is overwritten, you can redirect the computer into going to another address that executes malicious code. This high-level approach is fine, but I’m confused when we actually do it. So lets say the string is full of x90 and the malicious code is written as a substring in the string. You overwrite the return address, and crapFunc(str) goes to the malicious code. What I don’t get is, why does the computer execute the malicious code? If it’s just a string, the computer shouldn’t recognize it as an executable. Even if the computer does recognize it, how would the malicious code still run? Just because the program now points there doesn’t mean that it should be executed.
In: 0
“If it’s just a string, the computer shouldn’t recognize it as an executable”
Why not? What distinguishes a string of bytes from an executable of machine code bytes? Nothing.
C – especially – does not distinguish. You can literally write machine code in a struct and then cast that to a function and use it as one.
The only protections you have against this are hardware – things like memory tagging, DEP, etc. Those protections basically rely on the OS or the program itself to say “this whole page of RAM will only ever hold data” or “this whole page of RAM will only ever hold instructions”, and the hardware sets a bit in the page table to remember that and will refuse if you ask it to “jmp” to the executable code in a data section.
The problem is… there is little fine-grained control, and it’s mostly hardware that provides the facility. The software has no real way to know that anything like that has ever happened – the software has no way to know where the CPU is going to start pulling in data from next without SERIOUSLY interfering with the lowest-level operations. So software-level mitigations are often useless (e.g. early Windows DEP, current Spectre software patches, etc.).
But even the most minor of such hardware mitigations are not without cost either – now every time you execute instructions, you need to check that they’ve come from an executable part of memory and not data, and preserve or modify those flags whenever memory is moved around, copied, destroyed, etc.
C literally is designed to allow you to treat series of bytes as whatever you like. The OS historically also let you do this, because “checking” was a huge resource drain in the early days and people just couldn’t afford it. Now the OS implements some hardware mitigations and manages whether pages are executable or not, for example, and makes them non-executable by default.
But even then… allocating a page of executable memory is just one bit different and often necessary (e.g. any code that’s self-modifying, drivers, etc.). So there’s nothing stopping a program asking for executable RAM and doing what it likes with it, and THOUSANDS of programs do that on your computer, even parts of the OS. There’s also nothing stopping a program – when it’s asked for executable RAM properly and using it respectfully – having an exploitable flaw in that section of code which means that the exploit / overflow is contained WITHIN the executable page, and never needs to stray outside.
Things aren’t done on a bit-by-bit basis with this (for performance but also data-size reasons… you can’t use up kilobytes of storage for flags for every kilobyte of RAM, for example!), you only really flag entire pages as executable, which can be *huge* amounts of data/code, so there’s plenty of scope for overflow. And once you have an exploit running code, what’s the first thing you’re going to ask the code to do for you? Make some working space somewhere executable, the same way an ordinary program would. It’s literally just a few bytes of code to do that and then you have as much executable space to play with as you like.
And it’s not unique to C, by the way. C just doesn’t pretend to get in your way. Pretty much any language can be used to execute raw code with enough effort, to reinterpret bytes of string as bytes of code. The only real difference is that C doesn’t PRETEND it’s secure in that regard. It never has. Even Rust has to have the “unsafe” keyword to get anything useful done, and once you use that keyword, there is a potential to overflow into “safe” code and modify it and there’s no real way to detect or mitigate that.
A CPU just sees executable instructions as bytes and data to those instructions as bytes. It can’t distinguish unless it’s literally instructed to do so, and there’s a performance hit or hardware support required to do that, and such protections are never complete. And the tiniest, tiniest hole anyway lets you then defeat all those protections.
Latest Answers