IUve been doing a lot of reverse engineering and very often thereUs this common thing coming up: I would really like to have a tool for this because I need to look at this particular API, or look at what would happen if I do this vs that, etc.
Is it realistic?
It’s quite a lot of work to build a tool from scratch, and existing tools aren’t really suitable for the iterative reversing kind of use-case. I would also like to have a really short feedback-loop when doing this, because as a reverser my understanding of the problem tends to evolve as I go along, which means I need to circle back to adapt the tool. If I then have a long feedback loop then I’m just going to waste a lot of time, and in the end it might be better to just do things by hand and go through the painstaking process using conventional tools.
I would also like to have just one toolkit, so that whichever platform I’m currently on there’s just one toolkit I need to learn once. Of course there’s platform-specific bits involved, but at least I don’t have to re-learn the basics every time.
oSpy
Back in 2004 I was doing a lot of reverse engineering of protocols. At the time I was focusing on Windows Live Messenger as I was using Linux and wanted reasonable feature parity with my friends using the Windows version. Later I also got myself a Windows Phone, and it turned out only older models were supported by the synchronization software available for Linux users. While doing the ActiveSync reversing I ended up building a tool called oSpy that looks like this:
It’s kind of like Wireshark, but works at the API level, which means you can look at encrypted communications and pretty much anything you’d like. This was written in C#, and I injected code into the target process in order to instrument it, and that code was written in C. This was very powerful as you could trace any function you’d like.
With great power came great complexity
Developing the injected code was a long and tedious process. For each function traced you’d have to write quite a bit of boilerplate. You then had to compile it, restart the target app in case you left it in an unstable state last time, deploy your payload, inject it, and finally wait to the target app to call that API. Well, the feedback loop was pretty horrible. oSpy did help tremendously for the challenges I was facing back then, but it was this very slow thing to evolve, and it was not very portable.
Trying again
Later I had new reversing projects coming up, and I found myself adding temporary hacks to the oSpy code-base. As powerful as the low-level function tracing was, I couldn’t let oSpy turn into a kitchen-sink, and it wasnUt really feasible to keep writing those bits in C. This lead me to creating oSpy2, where I tried to get C out of the equation by letting you define functions to be traced declaratively through a config file:
Time was however not my friend
All of my reversing adventures were of recreational nature, and I couldn’t fit much into my not-so-ample spare-time. oSpy2 had a functional engine, but it was Windows-only, and there was a Windows UI that would let you instrument an application using this engine, and then let you save the resulting trace to a file. I started building a cross-platform UI that would let you open and analyze these files, but I didn’t have time to finish it.
A new beginning
Time went by and I couldn’t quite shake the feeling that I should start from scratch one more time, but this time around I should build a cross-platform dynamic binary instrumentation library and make that work really well. I named this library Gum, as it was a GLib-based instrumentation library written in portable C. More time went by, it was 2010, and Frida was born. The idea was to build a logistics layer that would inject a shared library containing Gum into a target process, and let you upload control logic written in JavaScript that would have full access to the Gum APIs, read/write memory, etc.
The next chapter
It is now 2015, and after one foray into the video-conferencing world followed by another co-founding a startup in the music industry, my passion for dynamic binary instrumentation and reversing obviously has had to take a back-seat for quite some time. I did manage to cram some recreational Frida hacking into spare-time and holidays, and considering how many OSes and architectures it now runs on that’s pretty good. There has however been a yearning to be able to spend more time on Frida, to truly unleash its potential and make it the ultimate toolkit for dynamic instrumentation. This is why I’m absolutely stoked to continue this work with NowSecure, where I see it being very useful to mobile specifically, aligning perfectly with the strong multi-platform focus back when it all started.
In the time ahead I am really excited about improving feature parity across OSes and architectures, and make the mobile part of the story even stronger. I also plan on attending ZeroNights 2015, where I will be conducting a workshop on Frida. It will cover all the basics and also some advanced topics, so donUt miss out if you have a chance to attend.
In closing
Let’s do a quick demo. Suppose someone wrote this simple app:
#include <stdio.h> | |
#include <unistd.h> | |
void | |
f (int n) | |
{ | |
printf (“Number: %dn“, n); | |
} | |
int | |
main () | |
{ | |
int i = 0; | |
printf (“f() is at %pn“, f); | |
while (1) | |
{ | |
f (i++); | |
sleep (1); | |
} | |
} |