eli5: buffer overflow attacks

251 views

Given a very shoddy program like,

void crapFunc(String str){
buffer [500];
str_cpy(buffer, str);
return;
(sorry for crap code, I’m not too familiar with C or C++)

I learned that if you have a string that goes over the buffer limit and into the stack where the return ptr is overwritten, you can redirect the computer into going to another address that executes malicious code. This high-level approach is fine, but I’m confused when we actually do it. So lets say the string is full of x90 and the malicious code is written as a substring in the string. You overwrite the return address, and crapFunc(str) goes to the malicious code. What I don’t get is, why does the computer execute the malicious code? If it’s just a string, the computer shouldn’t recognize it as an executable. Even if the computer does recognize it, how would the malicious code still run? Just because the program now points there doesn’t mean that it should be executed.

In: 0

4 Answers

Anonymous 0 Comments

Your memory contains data like

[buffer[0]]
[buffer[1]]

[buffer[499]]
[return address]

When you copy over the buffer, and then keep copying more bytes… the next bytes over-write the return address. Then, when the computer exits the function, it looks at the new return address you created, and says “ok, the next instructions will be found over <here>”, which is conveniently pointing into your malicious string. The computer then just starts reading those bytes as instructions.

Note that there’s many protections against this these days, one of which is the “no execute” bit on a memory page. This tells the CPU to fault if its instruction pointer ends up pointed to that memory.

>If it’s just a string, the computer shouldn’t recognize it as an executable.

The computer has no idea what a string is. A string is a very high level concept that a programming language constructs from many lower level constructs. The CPU sees whatever bytes the instruction pointer is pointed at as code to execute. It doesn’t care how those bytes got there.

Anonymous 0 Comments

>why does the computer execute the malicious code?

This is why the return pointer is overwritten.
The return pointer is basically the computer’s instruction saying “Now go here and do this”
In ordinary circumstances, it would be normal function, but because you overwrote it, the “here” part is the malicious function you encoded in the string.
The computer can’t actually tell that the malicious code was originally input as a string, to the computer it’s just 1s and 0s at an address, and in this case it was instructed to treat those 1s and 0s as an instruction.

Anonymous 0 Comments

“If it’s just a string, the computer shouldn’t recognize it as an executable”

Why not? What distinguishes a string of bytes from an executable of machine code bytes? Nothing.

C – especially – does not distinguish. You can literally write machine code in a struct and then cast that to a function and use it as one.

The only protections you have against this are hardware – things like memory tagging, DEP, etc. Those protections basically rely on the OS or the program itself to say “this whole page of RAM will only ever hold data” or “this whole page of RAM will only ever hold instructions”, and the hardware sets a bit in the page table to remember that and will refuse if you ask it to “jmp” to the executable code in a data section.

The problem is… there is little fine-grained control, and it’s mostly hardware that provides the facility. The software has no real way to know that anything like that has ever happened – the software has no way to know where the CPU is going to start pulling in data from next without SERIOUSLY interfering with the lowest-level operations. So software-level mitigations are often useless (e.g. early Windows DEP, current Spectre software patches, etc.).

But even the most minor of such hardware mitigations are not without cost either – now every time you execute instructions, you need to check that they’ve come from an executable part of memory and not data, and preserve or modify those flags whenever memory is moved around, copied, destroyed, etc.

C literally is designed to allow you to treat series of bytes as whatever you like. The OS historically also let you do this, because “checking” was a huge resource drain in the early days and people just couldn’t afford it. Now the OS implements some hardware mitigations and manages whether pages are executable or not, for example, and makes them non-executable by default.

But even then… allocating a page of executable memory is just one bit different and often necessary (e.g. any code that’s self-modifying, drivers, etc.). So there’s nothing stopping a program asking for executable RAM and doing what it likes with it, and THOUSANDS of programs do that on your computer, even parts of the OS. There’s also nothing stopping a program – when it’s asked for executable RAM properly and using it respectfully – having an exploitable flaw in that section of code which means that the exploit / overflow is contained WITHIN the executable page, and never needs to stray outside.

Things aren’t done on a bit-by-bit basis with this (for performance but also data-size reasons… you can’t use up kilobytes of storage for flags for every kilobyte of RAM, for example!), you only really flag entire pages as executable, which can be *huge* amounts of data/code, so there’s plenty of scope for overflow. And once you have an exploit running code, what’s the first thing you’re going to ask the code to do for you? Make some working space somewhere executable, the same way an ordinary program would. It’s literally just a few bytes of code to do that and then you have as much executable space to play with as you like.

And it’s not unique to C, by the way. C just doesn’t pretend to get in your way. Pretty much any language can be used to execute raw code with enough effort, to reinterpret bytes of string as bytes of code. The only real difference is that C doesn’t PRETEND it’s secure in that regard. It never has. Even Rust has to have the “unsafe” keyword to get anything useful done, and once you use that keyword, there is a potential to overflow into “safe” code and modify it and there’s no real way to detect or mitigate that.

A CPU just sees executable instructions as bytes and data to those instructions as bytes. It can’t distinguish unless it’s literally instructed to do so, and there’s a performance hit or hardware support required to do that, and such protections are never complete. And the tiniest, tiniest hole anyway lets you then defeat all those protections.

Anonymous 0 Comments

>If it’s just a string, the computer shouldn’t recognize it as an executable

That’s the neat part, it doesn’t! Everything in memory — code, strings, numbers, colors, etc — are just bytes. How those bytes are used depends on what you tell the computer to do with them. The bytes “53 55 56” can be

* “SUV” when read as text,
* “push ebx; push ebp; push esi“ instructions as x86 code,
* or a 33% gray when interpreted as a color.

You could open an executable in notepad and notepad will read it as strings. Most of the strings won’t make sense, but some would. Conversely, if point the CPU to a string and say “run that”, it’ll happily try to interpret those bytes as machine code. Now *usually* the computer won’t be able to make much sense of it and the program will crash. But if you pick your strings right, you can make it jump to a point of memory where you hid away some malicious code and, well, you’re screwed.

To see just how far you can take this, some speedrunners groups have fooling around with older games to make them run any sort of code they want by placing certain on-screen items in such a way that it can be used as code. It’s wild what they can do with that. For example, the skip-to-end-credits in Super Mario World: [https://youtu.be/vAHXK2wut_I](https://youtu.be/vAHXK2wut_I). Now, I don’t expect you to understand that video, but just know that it’s f%$^#@$g disgusting.