Decompiling Apps With AI Language Models

Sergi Àlvarez i Capilla

Mobile Security Analyst at NowSecure

Sergi, also known as Pancake, performs mobile security research at NowSecure and is the creator of Radare, an open-source reverse-engineering tool.

January 29, 2025

Mobile Security Solutions, Research & Threat Intel

At NowSecure, we continually explore innovative ways to analyze mobile apps without requiring access to their source code. As mobile app development changes quickly with new languages, frameworks and technologies, reverse engineering tools often struggle to keep pace.

Artificial intelligence (AI) language models are emerging as valuable tools for mobile security analysts and developers, offering significant benefits such as aiding in structured vulnerability assessments or generating code. However, limitations such as “hallucinations” in which the model generates inaccurate or misleading outputs highlight the importance of human oversight in managing risk posed by AI.

This post covers a novel approach for recovering application source code, leveraging AI language models to transform pseudo-disassembly into high-level source code. This method is able to handle complex abstractions introduced in high-level languages SwiftUI or Dart and generates output in popular programming languages like Swift, C#, Kotlin, Java, Python or even Bash.

Extending radare2 with AI Support

Radare2, a well-known reverse engineering framework, supports both static and dynamic analysis. Recognizing the potential of integrating AI, we developed the r2ai project: a playground for experimenting with AI-enhanced reverse engineering.

R2AI was initially developed in early 2023 as a Python tool using llama and r2pipe to communicate with radare2, the project evolved into two separate new subprojects: “Decai”, an r2js plugin focused on decompilation and the native C plugin, all capable to use any local and remote inference engines.

Key features include:

Easily compare and test different models, prompts and configurations
- Works with any local model in r2ai-server, ollama, llamacpp, kobaldcpp, vllm or remote OpenAI, Claude, Gemini, Bedrock, X.AI, DeepSeek services
Decompile into any language, perform type propagation and rename variables functions
Solve crackmes and find out hidden passwords with the AUTO mode
Cache results and include context details to chain refinements
Explain and simplify code logic and identifies vulnerabilities
Writes exploits, reverse engineers network protocols and automates tooling creation..

The auto mode feature can resolve reverse engineering challenges in a single prompt. For example, a prompt like “What is the password for this application?” generates an iterative loop where the language model plans, executes radare2 commands and analyzes their output, performing an analyst’s workflow without human interaction.

There’s a lot of ongoing research and development in this area, commercial models perform better, but time will allow us to solve more complex tasks and use local models, reasoning and better embedding in the near future.

Setting Up The Tooling

To get our hands busy and play with this capability, you’ll need to:

1. Install radare2 (preferably from GitHub)

2. Install the “Decai” plugin via the r2pm package manager:

3. Verify installation by running:

By default R2AI and Decai will use your local ollama setup, so your code won’t hit any remote endpoint or leak any data to third parties unless you configure that explicitly. Check out the additional resources at the end of this article to learn how to customize your setups.

How it Works

Decai performs the decompilation process by lifting a post-processed version of the disassembly listing of the binary generated by radare2 into a readable high level version using comments, types and other metadata associated with the function. This makes it architecture-independent and flexible enough to convert simple label-based listings into high level control flow statements like switch/if/else/for/while.
Internally radare2 translates assembly instructions from the binary into pseudocode expressions using the asm.parse plugins. For example:

These expressions can be displayed by enabling the e asm.pseudo=true option. Going a step further, the pdc command will pseudo-decompile the entire function, walking through the basic blocks using labels and using, goto statements and comments referencing strings and referenced functions.

By default Decai will use `pdc`’s output as input for the language model. This ensures interoperability with all architectures supported by radare2. In addition other supported decompilers can be combined (using decai -e cmds=pdg,pdd,pdc), such as jadx, retdec, r2ghidra, r2dec, or even angr

You can also take output from. By explaining that all these outputs represent the same function, the inference engine gains a more complete understanding of what the code is doing.

Model Selection

There are many language models to choose from, almost every week new ones appear with breakthrough improvements. Considering how fast the AI ecosystem evolves, it is important to make model selection as flexible and easy as possible.

From our tests, there’s no single perfect option. None solve all problems and tweaking the prompt changes the results. For mobile app decompilation purposes using the default prompt in Decai, we prefer:

Local: IBM Granite and Alibaba Qwen
Remote: Anthropic Claude, OpenAI GPT4o

What’s key here is that these models are trained on vast amounts of source code taken from open-source projects, giving them a solid understanding of programming languages and how to transpile code constructions between them. Similar to how they handle natural languages they are able to perform transformations to make the code more readable or rewrite it at your will.

Typically, models categorized as “code” or “instruct” can rename variables, generate functions from descriptions, refactor code to eliminate dead code, explain a function’s logic, or rewrite goto statements into if/else blocks. These are exactly the steps we need to define in our prompt. Note that not all models behave the same way with the same prompt, so it requires trial and error depending on the target of choice.

When using local Ollama or r2ai we may need to download and select the model using the -m flag. -M will give us a list of alternative models for us.

The Prompt

There are different techniques to take advantage of models. The words we use to describe what we want, the dots, commas, enumerations, verbs can make a big difference in the output. And all those vary from model to model. To craft our prompt, let’s start by enumerating all the transformations we want to happen from the original pseudo-decompilation to the final form:

Identify local variables and find better names for them
Rename the function to describe the purpose of it
Inline function arguments in call statements
Remove non-relevant code
Refactor goto statements into if/else/switch
Identify loops and conditions and rewrite them as for/while

The prompt can be configured using a configuration variable in Decai. This default can be tweaked with the decai -e prompt=… command:

Results and Real-World Examples

With all the tooling ready, we can start to check the results of decompiling the following code in Swift:

Taking the example from a NowSecure Tech Talk, we will build the binary in release mode and run the following script to remove long flags and include comments containing the strings constructed by emulating the code. Here is the source code and binaries to follow these examples for comparison.

Upon launch, r2, will directly prompt the user to execute if a script named like the executable ending with “.r2”. We will need to start the r2ai web server in a separate terminal, tune Decai and wait for the output:

The code seems correct and is far better than what any other tool can produce, but it’s still far from perfect. To improve it, chain a query on top of the last output :

Decai can be configured to use any programming language as output, so let’s see what happens when we choose Python:

Removing all the boilerplate from the internal Swift string and reference counting calls from the decompiler output gives us a fairly clean output, which may not be perfect or completely trustable, but way more readable than other real decompiler options out there.

Interestingly, other commercial solutions won’t perform much better when targeting high-level programming languages like Swift, Rust, C# or Kotlin. These tools often fail to adequately address higher-level logic, particularly in applications based on frameworks like SwiftUI. Their outputs frequently lack the depth needed to fully reconstruct the original logic, leaving gaps that require further manual intervention.

In contrast, language models excel in extracting and understanding high-level logic. For instance, they can handle SwiftUI-based applications effectively, reconstructing the original logic with a precision unmatched by other tools. Here’s an example of some SwiftUI decompiled results:

Future Enhancements

As AI continues to evolve, so does the r2ai project, planned enhancements include:

Enhanced reasoning capabilities for more accurate outputs
Improved context extraction from Internet resources and code repositories
Iterative test suites to refine model selection and prompt optimization
Expanded auto-mode features for handling hands-free problem solving Improved documentation and usability to simplify adoption by new users

Watch for a future post on this topic.

Conclusion

Language models are transforming the reverse engineering landscape, enabling analysts to recover and optimize code with unprecedented ease. While these tools are not perfect and you should check the results, they significantly boost workflow efficiency and reduce the learning curve for new users.

Want to learn more or contribute to the r2ai project? Join the Discord and check out the repository and submit your patches.