Libprotobuf Fuzzing

In this article we are going to dive into Libprotobuf structure-aware fuzzing to learn how to perform fuzzing against functions that accept only complex data types likes Structures, Classes or other structured data.
Sep 6

Structure Aware Fuzzing 

Libfuzzer is coverage-guided fuzzer where we can fuzz c, cpp libraries and functions. However, functions often may require to pass parameters as "struct" rather than "char *" buffer. In these cases fuzzing with traditional randomized fuzzing may not be efficient and productive.

Structure-aware fuzzing comes in to play in these kind of situations where function arguments are pre-defined structs. To successfully perform fuzzing in these kind of situations, we can use Libprotobuf fuzzing from:

https://github.com/google/libprotobuf-mutator.
First let's look at the example of vulnerable code, then we are going to build a harness for Libprotobuf, perform structure-aware fuzzing on vulnerable function and hit heap-buffer-overflow crash.

"add_person()" function accepts new_person argument which is a struct person_data that means while fuzzing, we can't simply pass random "char *", because all argument inputs should be struct person_data.

With help of Libprotobuf fuzzer, we can create harness for struct person_data and mutate all variables with their types. unsigned long id (unsigned 64 bit integer) and char name[64] (byte value)
Before dive into fuzzing process, let's spot the vulnerability with source code review.Let's start from 16. line, if new_person->id that comes from input is between 100000-200000 loop continues and check every byte of new_person->name.
At line 19, if byte is "\x00" NULL byte and at line 21, if loop is at already "32. byte" and above, exact byte will be allocated heap memory as heap memory at line 24.

Later at line 38, 64 byte memory copied to this allocated heap memory and if heap memory was allocated less then 64 which is the case, if 32-64 bytes of new_person->name contains NULL byte, this will cause heap overflow.

Creating Proto File

Now let's look at how we would be able to find this bug with Libprotobuf structure-aware fuzzing.

First of all, unlike traditional stack overflows, heap overflows may not necessarily cause application to crash. Therefore we are going to use Address Sanitizer which is known as "ASAN", which will detect if any out of bound access happens in the runtime.
To be able to use ASAN in the runtime, we must compile source code with clang with flag -fsanitize=address . ASAN will detect if any heap overflow occurs in the runtime of application.
First we are going to create a proto file with name "person_data.proto" and define structure-fuzzing syntax. 

We must define same struct exactly in the source code.
First is 64 bit unsigned integer and second 64 byte characters, basically any byte character.

In protobuf syntax we can define variables as optional, but in our case we must send both "id" and "name" variables. Then we are using uint64 type for "id" variable and name it "fuzz_id" which we will use it later on
For "name" variable is also required and type of it is "bytes", we can define "char[]" variables as bytes in protobuf syntax.

Also we give "1" and "2" as order id so they should be in order same as "struct person_data". First "id" then "name" variable must be in our mutated input which is same exact as struct person_data.
After defining syntax we can compile "person_data.proto" file with protoc person_data.proto --cpp_out=fuzzer

Now within "fuzzer" directory, we have files that we need to include in our harness.

Creating the Harness

We create a file named “harnes.cpp” within fuzzer directory.
This is harness takes mutated input and create person_data object and pass it to add_person() function .
First we should DEFINE_PROTO_FUZZER and argument with protobuf syntax we defined earlier person_data_fuzz, all variables of this object are mutated by libfuzzer.

What we basically need to do is creating a standard struct person_data object and copy mutated data to this object, then pass it to add_person() function as in any normal application flow.
By calling pre-defined variable names as functions like at 15. Line. We can copy mutated variable to normal real_input object.

For fuzz_name() mutated data, we need to make sure it doesn’t overflow 64 byte internally so doesn’t crash the fuzzer itself, we make sure it doesn’t copy more than 64, as if real application execution flow.

Then pass real_input object to add_person() function at line 30.
We compile vulnerable source code with clang++ -w -c library.cpp -o library.o and later compile library.o with all other files.
Compiling the harness, vulnerable source code and Libprotobuf files together will result in binary the compiled harness which is able to fuzz our target functions.

This will output protobuf_fuzz binary which is our fuzzer. We can now start fuzzing by running it.
As we can see with our specifically created harness, fuzzer is able to trigger the vulnerability with ASAN crash that shows heap-buffer-overflow.

Here is the proto input version of input caused crash:
As we spotted vulnerability with source code analysis, id is between 100000-200000 and name contains null-byte within 32-64. bytes.

Check out the our Github repo with the code to try this yourself
https://github.com/mobilehackinglab/Libprotobuf-fuzzing

Want to learn how do complex fuzzing against Android Applications and real world target like WhatsApp and Telegram to identify memory corruption bugs?