Reversing iOS System Libraries Using Radare2: A Deep Dive into Dyld Cache (Part 2)
Posted by Francesco TamagniThis is the second (and the shortest) installment of my blog series about reverse engineering iOS system libraries with radare2. We’ll introduce the concept of cross-references, why they’re important and the gist of them in r2.
This blog post explores how to find cross-references from within a single library in the dyld shared library cache (DSC to friends), so if you missed the first post you can look it up now to know how to handle DSC as a single large executable with radare2.
All of the above boils down to finding cross-references — that is, finding from which address another address is referenced.
Finding Cross-References
When it comes to reverse engineering compiled code, a common task is figuring out how things are connected in order to answer questions such as:
- What code is using this text string / global variable?
- What code is calling this function / method?
- Is this pointer stored in any data section?
All of the above boils down to finding cross-references — that is finding from which address another address is referenced.
In radare2, cross-references are computed as a result of an analysis command.
Specifically, the bare minimum analysis which can be done to compute references from executable code is to linearly emulate it using ESIL. ESIL is the underlying intermediate language used by radare2 to represent and virtually execute code in a way that is independent from the architecture.
The command to perform that is aaex. It takes an optional size in bytes and emulates that amount of bytes of executable code doing the minimal amount of work needed to compute references starting from the current address (which can also be specified as temporary seek using the @ address suffix). If no argument is provided, it will emulate all executable sections of the binary.
In the case of DSC, however, we don’t usually want to emulate all the code that is there, because it’s huge and can take literally hours. Instead, we have to restrict the emulation to the portion of code we’re interested in — for example the __text section of a single library.
Cross-References Within a Library
The simplest and most frequent use case is to find references to code or data within a library which is part of the DSC.
Here are examples to do that covering some usual scenarios.
The common premise, though, is that we have to know ahead of time which library we’re interested in.
Example: Find Code Referencing a Specific String
Whether you want to narrow down the reason for an error, the code producing an obscure log message or simply to match source code and compiled code, locating references to strings is always a useful capability.
One task I have to periodically undergo is to analyze the code for Apple dyld (available as open source) to support new versions of the “slide info” structures used for rebasing and signing pointers as part of radare2 DSC plugin maintenance.
An easy way to locate the interesting code for this purpose is to find references to the error string “invalid slide info in cache file”. It appears at the end of a large if / else chain dealing with all the possible versions of it.
Here’s a GitHub excerpt of the source file dyld/SharedCacheRuntime.cpp which shows the usage of that string in its context:
As of quite recently, dyld’s executable code itself is also present in the DSC, as part of the lib/dyld image.
In order to find the reference to the string in the executable from within the DSC, we have to make sure we opened the DSC while having dyld as part of the R_DYLDCACHE_FILTER environment variable:
Let’s locate the string first to determine its address using the izq command:
This tells us the string is at 0x1ab7a9bdb. Take note of this because it’s the address that any code using this string will be referencing.
Then we can locate the __text section for lib/dyld, which defines the memory address range in which we’re expected to find a reference to it, because that contains the entire compiled code for the dyld executable. We can do this by filtering the output the iSq command (which lists sections like iS but the extra “q” suppresses the default ASCII-art columns for performance and brevity):
Now we can proceed to emulate the whole executable section in order to compute cross-references. We do that by seeking to the section first, then running the aaex command:
We pass $SS as the size argument, which means “current section’s size”, setting the temporary seek to $S which means “current section’s start address”. That’s a quite generic command we can reuse to emulate the whole current section.
After this command finishes (and it’s usually pretty fast when we limit its scope like this), we can then proceed querying the computed cross-references for the one we’re interested in.
For this we can use the axt command (analyze xrefs to) passing it the address of the string:
This immediately returns the address of the instruction found computing the address of the string (0x1ab7782ec), which we can then visualize in its context using pd. This is precisely the function we’re interested in:
This technique is quite generic and will work even in absence of debug symbols for matching source code to the corresponding compiled code, as long as string constants are available.
As a final note, remember that all cross-references, after being computed, are kept in memory, so there’s no need to run the aaex command again for finding different references from the regions of code which have already been emulated.
Example: Finding Multiple References
You can use the same methodology as in the previous example to find references to any address, including global variables, functions, Objective-C ivars or even Objective-C selectors.
For example, let’s find the references to the tubeManager ivar of __NSURLSessionLocal which we already encountered in the first episode.
To do that, let’s open the DSC with a filter including the Foundation framework first:
Then get the address of the tubeManager ivar by grepping flags:
Let’s find out where the ivar itself lives, using the iSq. command which reveals the section containing the address where the command is run:
This output tells us that the ivar’s address belongs to the __objc_ivar section of the __DATA segment of the CFNetwork framework. So let’s find references from the CFNetwork framework’s executable code (which is stored in the __text section of the __TEXT segment of the same framework). As before, we can find the start address of that section, seek there, emulate it to compute the references and find the references to the ivar address:
There are quite a few hits this time. When this happens, depending on the actual task at hand, we may need some more information in order to prioritize which ones we should be looking at first.
A good way to do so in r2 is to feed every address of the resulting xrefs into the fd command which translates any address into flag name + offset representation.
We can do this using a one-liner which leverages command composition, where the filtered output of one command can be used as input for another one by means of the backticks (“) operator.
In this case, we’re going to use the second column (~[1]) of the axt output above (the address) to create a list of addresses, then use the seek iteration operator @@= to use each element of the list as temporary address for the fd command, which is then called once for each address of the found references:
This illuminates the context in which the references are found before digging deeper into each of them, all without having to wait for r2 to analyze all possible functions ahead of time.
For example in the output above, if the task is to figure out what a “tubeManager” is or does, I’d skip over everything which seems to be part of a deallocation routine or a copy constructor. Those functions usually reference all ivars and it’s unlikely there is any logic specific to this one in particular.
Beware, however, that there’s no guarantee about the offset from a flag falling inside the corresponding symbol, because there could be unnamed functions in between. Therefore, it is always a good idea to double check the results of fd for that before deciding whether to skip over or dig deeper into one, by being extra suspicious of big offsets — and short functions.
Conclusion
Well, this went by quickly! Hopefully you found the blog useful. There’s still a lot to say about cross references, so in the next post we’ll dig more and look at some ways to find references across libraries within the DSC.
As always, questions are welcome, as are issues and pull requests on radare2’s GitHub.