Posts WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning
Post
Cancel

WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning

Paper Info

  • Paper Name: WINNIE: Fuzzing Windows Applications withHarness Synthesis and Fast Cloning
  • Conference: NDSS ‘21
  • Author List: Jinho Jung, Stephen Tong, Hong Hu, Jungwon Lim, Yonghwi Jin, Taesoo Kim
  • Link to Paper: here
  • Food: Lemon Mousse

Prerequisites

Fuzzing

Fuzzing is the generalized process of feeding random inputs to an executable program in order to create a crash. This crash reveals the presence of a software bug that allows a developer to patch it or could possibly be used as part of an exploit. It is one of the efficient ways in finding issues in a program or software. The random inputs are made of generators and vulnerability issues are based on debugging tools. Generators utilize a combination of static fuzzing vectors or totally random inputs. The Fuzzers are dependent on file-formats and data-types. One disadvantage of simple fuzzing is that it uses large binary files for every corner test case, which take years to execute.

Here is an overview image from the paper:

Feedback Driven Coverage

also known as smart “fuzzing”, this is the concept of using a measurement such as increasing lines of code coverage, edge coverage, increasing buffer length or memory usage, as a heuristic to drive the direction of future random input choices. Instead of blindly trying every possible random input, the program detects that a previous choice of input has unlocked a new previously unseen section of program code and, then spends extra effort on this new code by maintaining the input choices that reached the new code static, and attempting randomized input that “exercises” new code beyond this point.The random input which created a new section of the program is recorded in a coverage map and helps the fuzzer to generate new random inputs based on the map. For example a line in the code isn’t traversed before and the inputs are aligned to mutate for executing the line of code.

The conceptual idea behind feedback driven coverage is that the more code that you “exercise”, the greater chance that you will succeed at creating a crash.

Forking for Fuzzing

The exec syscall and also the typical process startup procedure is very slow. To avoid having to reinitialize a process for every single fuzzing attempt, the conceptual ideas to run the program until you reach the point that you want to fuzz through randomized input, insert a fork server at this point in the program, and fork the program continually for each input trial. This increases fuzzing performance by one or two magnitudes. Each subprocess will be taking only a single input, so we can avoid the exact syscall overhead. If you take a bash file and will be executing the fuzzing at the 3rd call, then it doesn’t decrease the performance but decreases the overhead of fuzzing through the whole program.

Harnessing

For simple executable programs, you simply provide random input via stdin. This might be far less effective when trying to fuzz a library that performs a complicated function, such as parsing a graphics image. A harness is a sequence of inputs, a valid file, a series of instructions, as concise as possible that provides an entry point into the program, typically uses it in a very simple way, and provides access to a place where it is possible to execute a decent amount of code that you want to fuzz. A mutational-based fuzzer can then use this harness as a starting point to generate random input, making changes to the input sequence to reach different areas of the code

Generally, creating a harness automatically for an arbitrary program is non-trivial and has yes to be solved.

GUI Fuzzing

Programs such as Monkey can fuzz by performing random GUI actions. This is typically very slow. An alternative approach is to reverse engineer the GUI program and use it as a library, bypassing the graphical interface. This is done conceptually by bypassing the GUI bindings in a graphical-based program and providing an interface to fuzz of the GUI actions directly.

Problems

Issues in Fuzzing Windows

  1. GUI programs innately are difficult to fuzz due to slow graphics-based interaction
  2. Harness generation for Windows/graphical programs is challenging
  3. Lack of copy on write forking in the Windows API eliminates many of the fuzzing optimizations that exist in Linux, meaning that fuzzing operations operate at roughly one order of magnitude slower

To solve these problems, we need some form of conversion between GUI to CLI to avoid overhead in generating graphics. We need an automated method of creating usable harnesses, and we need to develop an alternative approach that creates a usable fork function call in Windows.

In addition, other issues exist that are specific to Windows and modern fuzzers:

  • AFL does not support Windows binaries
  • WinAFL exists, but is far more limited such as having no fork server mode. In practice, this means it will operate 10X-100X slower then native AFL
  • Honggfuzz on Windows exists but has no feedback-based fuzzing support
  • Another popular Windows fuzzer is Peach, but this also has no feedback based fuzzing

Issues in Harnessing

Harness generation is a very difficult task for many reasons:

  1. Need to decide precisely what the fuzz – avoiding the GUI code and only triggering actions.
  2. Since these actions are often not context free, you must know what call sequences are capable to activate this code properly. Blind function calls often will simply crash the program.
  3. Recovering proper input: for example, how to create a pointer to a structure that is needed to call a particular target function properly.
  4. Reconstructing control flow and dataflow dependencies

Persistent Fuzzing

Persistent mode is another method used to speed up the fuzzing process. It is nothing more complicated than calling a target function thousands of times at a certain point in program execution. Unfortunately it has additional complications and side effects:

  1. Crashes found in persistent code may not crash in single mode
  2. Unstable system/memory/file leaks will prevent you from using persistent mode properly
  3. Certain programs need to make state changes which simply cannot be tested with a fork server. If a target function is not truly independent, and each call changes the system state in some way, then multiple or thousands of calls to the function may cause the system to crash and/or become unstable

Solutions

Harnessing

To determine the optimum fuzzing targets within a program, you can perform dynamic program analysis. This is nothing more than executing a program multiple times while logging program traces and capturing promising functions. These are determined by two basic guidelines:

  1. Do they accept file paths as input?
  2. Does it actually perform a system call to open the file and then parse the contents of that file?

The program then analyzes the call graph to find the deepest or earliest functions that meet these requirements. By running the program multiple times on the same input, you can abuse ASLR to determine which variables are pointers since they always change in values, representing different memory addresses, while always pointing to the same data. We then build a harness skeleton by reconstructing all the function prototypes, combining static analysis plus traces from earlier dynamic analysis. Is also important to avoid irrelevant calls with multithreaded programs – often only one thread does things interesting while the other threads do nothing or perform irrelevant behavior. It is possible to do this by using logged thread IDs to make sure we only analyze one thread at a time.

The final challenge is to identify control flow & data flow dependencies. To resolve control flow dependencies, static analysis is used. If functions are called in sequence, then implement that same sequence in the harness. Complicated programs cannot be analyzed by Winnie and these issues must be resolved manually. To resolve dataflow dependencies, cases where return values are used as arguments for other functions, Winnie attempts to connect variable usage across multiple call locations.

Finally test performance of each harness and make sure code coverage increases.

Forking

The papers implementation of forking in windows is a direct follow up to previous work in forking in windows. An overview can be seen in the picture found in the paper:

The authors implement a version of CSRSS with Copy-on-Write support. Overall, the implementation follows a standard unix-like implementation of forking but with the engineering constraints of Windows.

Evaluation

The evaluation section of the paper attempts to answer 4 research questions about WINNIE:

  1. Can WINNIE test a large variety of Windows application?
  2. How efficient is fork on versus other modes of fuzzing like persistent mode?
  3. How effectively can WINNIE create fuzzing harnesses from binaries?
  4. Can WINNIE discover new program states and bugs from real world applications?

The majority of the evaluation is dedicated to comparing against the commonly used WinAFL, a windows implementation of Windows with less general support for forking. WinAFL was configured

Some interesting things from the evaluation:

  • Winnie works on all 59 programs tested; WinAFL crashes around half of the 59.
  • Winnie compared to stable WinAFL had a 31 times speed up

Discussion

  • Is this paper more engineering work then scientific contribution?
  • Good science: Automatic harness generation system– implemented for the first time on closed src programs, all other solutions require open source
  • Bad science sell: “Hey, we fuzz GUI programs in Windows”
  • Good science sell: Hey, we need forking / GUI fuzzing

(Take things from a problem solving perspective) Did not mention or deal with the other windows fuzzing paper: “Designing New Operating Primitives to Improve Fuzzing Performance” Danger of resubmitting papers: reviewers copy paste the reject reviews from earlier submissions

  • paper sometimes have been accepted after six re-submissions
  • Taking the recommended improvements that reviewers leave and implementing them often are simple enough to take rejected papers and make them accepted

Practical paper: future development and actual usage of Winnie will determine how successful or impactful this paper will be.

This post is licensed under CC BY 4.0 by the author.