title: "using qemu-user emulation to reverse engineer binaries"
date: "2021-05-05"
---
QEMU is primarily known as the software which provides full system emulation under Linux's KVM. Also, it can be used without KVM to do full emulation of machines from the hardware level up. Finally, there is `qemu-user`, which allows for emulation of individual programs. That's what this blog post is about.
The main use case for `qemu-user` is actually _not_ reverse-engineering, but simply running programs for one CPU architecture on another. For example, Alpine developers leverage `qemu-user` when they use `dabuild(1)` to cross-compile Alpine packages for other architectures: `qemu-user` is used to run the configure scripts, test suites and so on. For those purposes, `qemu-user` works quite well: we are even considering using it to build the entire `riscv64` architecture in the 3.15 release.
However, most people don't realize that you can run a `qemu-user` emulator which targets the same architecture as the host. After all, that would be a little weird, right? Most also don't know that you can control the emulator using `gdb`, which is possible and allows you to debug binaries which detect if they are being debugged.
You don't need `gdb` for this to be a powerful reverse engineering tool, however. The emulator itself includes many powerful tracing features. Lets look into them by writing and compiling a sample program, that does some recursion by [calculating whether a number is even or odd inefficiently](https://ariadne.space/2021/04/27/the-various-ways-to-check-if-an-integer-is-even/):
Normally, you would also want to install the `qemu-openrc` package and start the `qemu-binfmt` service to allow for the emulator to handle any program that couldn't be run natively, but that doesn't matter here as we will be running the emulator directly.
The first thing we will do is check to make sure the emulator can run our sample program at all:
Alright, all seems to be well. Before we jump into using `gdb` with the emulator, lets play around a bit with the tracing features. Normally when reverse engineering a program, it is common to use tracing programs like `strace`. These tracing programs are quite useful, but they suffer from a design flaw: they use `ptrace(2)` to accomplish the tracing, which can be detected by the program being traced. However, we can use qemu-user to do the tracing in a way that is transparent to the program being analyzed:
But we can do even more. For example, we can learn how a CPU would hypothetically break a program down into translation buffers full of micro-ops (these are TCG micro-ops but real CPUs are similar enough to gain a general understanding of the concept):
All of these options, and more, can also be stacked. For more ideas, look at `qemu-x86_64 -d help`. Now, lets talk about using this with `gdb` using qemu-user's gdbserver functionality, which allows for `gdb` to control a remote machine.
To start a program under gdbserver mode, we use the `-g` argument with a port number. For example, `qemu-x86_64 -g 1234 ./example` will start our example program with a gdbserver listening on port 1234. We can then connect to that gdbserver with `gdb`:
All of this is happening without any knowledge or cooperation of the program. As far as its concerned, its running as normal, there is no ptrace or any other weirdness.
However, this is not 100% perfect: a program could be clever and run the `cpuid` instruction and check for `GenuineIntel` or `AuthenticAMD` and crash out if it doesn't see that it is running on a legitimate CPU. Thankfully, qemu-user has the ability to spoof CPUs with the `-cpu` option.
If you find yourself needing to spoof the CPU, you'll probably have the best results with a simple CPU type like `-cpu Opteron_G1-v1` or similar. That CPU type spoofs an Opteron 240 processor, which was one of the first x86_64 CPUs on the market. You can get a full list of CPUs supported by your copy of the qemu-user emulator by doing `qemu-x86_64 -cpu help`.