Posts RE-Mind: a First Look Inside the Mind of a Reverse Engineer
Post
Cancel

RE-Mind: a First Look Inside the Mind of a Reverse Engineer

Paper Info

  • Paper Name: RE-Mind: a First Look Inside the Mind of a Reverse Engineer
  • Conference: USENIX ‘22
  • Author List: Alessandro Mantovani, Simone Aonzo, Yanick Fratantonio, Davide Balzarotti

  • Link to Paper: here
  • Food: Pizza

Introduction

“When a human activity requires a lot of expertise and very specialized cognitive skills that are poorly understood by the general population, it is often considered an art”. Humans are still at the center of performing the binary reverse engineering (RE) task. How humans approach RE tasks is still under-studied. Yet, this is an important topic for the community to develop tools to assist humans in RE tasks or potentially automate the task completely. This work lays the foundation towards the direction of understanding the mental models of human reverse engineers by comparing the reversing strategies adopted by novice and expert reverse engineers and comparing their performance.

Design

More specifically, this work tries to answer the following research questions:

  1. What do experts do differently from novices?
  2. Do experts/novices share particular strategies to explore binary code?
  3. How are these strategies linked to the binary code elements (e.g., functions, basic blocks)?
  4. Is any particular strategy correlated with better RE performances?

The most important step in this research is recruiting participants, namely, “novices” and “experts”. The authors recruited students who had some RE trainings and experienced RE players from the top CTF teams as the subjects. Instead of grouping them by reputations, the authors group the participants according to their self-evaluation and their performance during the experiment.

Then the participants are requested to solve two RE tasks on a online platform that is tailored for this research, where Restricted Focus Viewer technique is employed. In other words, only one block is visible at a time for the participants, at the same time other blocks will be blurred. This solution allows the researchers to collect the attention information of the participants and enable further study.

Insights

Even though there were considerable differences in the amount of time spent looking at unrelated code between a novice and an expert, the authors found that there was considerable variance among the experts themselves in terms of approaches.

The authors analyzed the low level event data and generated high-level features representing observable behaviors. Based on these high level features, they found some common strategies between experts and novices.

The authors say that a participant predominantly uses a given strategy if the strategy is employed at least 50% more frequently than the other. If this does not happen, the participant is assigned to the hybrid category.

From Table 2, it is apparent that a small number of novices use a sequential strategy whereas none of the experts use this approach. Instead, experts seem to favor the forward approach in higher numbers as compared to the backward approach.

Table 3 shows that novices spend a larger amount of time analyzing functions that perform simple tasks (such as is_number which only verifies that all parameters are integer numbers) as compared to experts who quickly understand the semantics of such functions and move on to other functions of the binary.

An important result that can be found here is that, in percentage, novices spent almost the same percentage of their time (8.6% vs 8.3% for experts) on reversing useless code. In absolute terms, novices still spent 4x the time that experts spent in analyzing useless code.

Figure 7 shows that experts frequently skip basic blocks because they quickly identify the relevant branches and discard the irrelevant ones. Novices however, spend more time going through basic blocks that might be irrelevant.

Another insight that the authors identified from the results is that experts visit 28% of the basic blocks only once. And in 80% of these cases, the visit lasted less than two seconds. This means that experts dismiss almost 22% of the basic blocks in a single glance. Whereas inexperienced users make a single visit only for 10% of the basic blocks and in total, dismiss only 7% in less that two seconds.

Discussion

This works lays the foundation work towards understanding human mental models for RE tasks. Its adoption of Restricted Focus Viewer to collect participants’ attention enabled them to approximate participants’ thought process and analyzed the strategies hidden in the participant’s subconsciousness. We believe this work is going to inspire more related work in the future and help the community better understand reverse engineers’ mental models.

Although this work is solid, it is not perfect. The research fails to explore some interesting aspects in the RE tasks. For example, Restricted Focus Viewer enables the researchers to collect attention information but also disables the participants from using the strategy of “skimming through the whole code” and grasp the gist of the code. Hence, this strategy cannot be evaluated. Besides, the two RE tasks are relatively easy compared with real-world RE tasks, which limits the research from answering the question of “do reverse engineers adopt different strategies for binaries with different complexity?” We hope there will be extensive future works on this topic that can address the limitations exhibit in this work.

This post is licensed under CC BY 4.0 by the author.