Posts Everything Old is New Again
Post
Cancel

Everything Old is New Again

Paper Info

  • Paper Name: Everything Old is New Again: Binary Security of WebAssembly
  • Conference: USENIX Security ‘20
  • Author List: Daniel Lehmann, Johannes Kinder, Michael Pradel
  • Link to Paper: here

What is WebAssembly

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications. [1]

In other words, JavaScript just isn’t cutting it for some people. Compilers are cool, and so is C++. Let’s compile our C++ to wasm, and then our client side code is fast, and we didn’t have to write JavaScript: win, win.

Everything Old is New Again

Buffer overflows are as old as time. C++ has buffer overflows, but JavaScript not so much (or at least lets pretend it doesn’t, at worst its very rare). This is exactly the problem that this paper investigates. If we compile these memory unsafe languages to wasm, what are the security implications?

In terms of security, this paper describes two seperate security models. The first, host security, is “the effectiveness of the runtime environment in protecting the host system against malicious WebAssembly code”. The second, binary security, is “the effectiveness of the built-in fault isolation mechanisms in preventing exploitation of otherwise benign WebAssembly code”. This paper focuses on the latter. In other words, let’s assume the virtual machine running wasm is safe and isn’t going to allow us to suddenly start maliciously interacting with the host machine. With that being said, what sorts of chaos can occur within the wasm vm, and what are the security implications?

Know your enemy

If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle. - Sun Tzu

This diagram gives a good overview of the modern standard security mitigations, and shows that a lot of security progress made over the last several decades have just been wiped away with wasm.

But don’t worry, because we don’t need stack canaries and ASLR. As the security document for wasm [2] states,

Since compiled code is immutable and not observable at runtime, WebAssembly programs are protected from control flow hijacking attacks.

Further, the wasm security document goes on to state,

A protected call stack that is invulnerable to buffer overflows in the module heap ensures safe function returns.

Incredible.

Types

As anyone that’s ever used Haskell will tell you: you’ve got to have a sweet type system. In wasm, there are four primitive types: 32 and 64 bit integers (i32, i64) and single and double precision floats (f32, f64). More complex types, such as arrays, records, or designated pointers do not exist.

Amazing.

Control-Flow

Unlike native code or Java bytecode, Web-Assembly has only structured control-flow. No jumping to arbitrary addresses, you better know where you are going.

Indirect Calls

But what if I don’t know where I’m going to jump? The call_indirect instruction pops a value from the stack, which it uses to index into the so called table section. Table entries map this index to a function, which is subsequently called. You better put your function in the table. Furthermore, you must specify the function signature in the instruction, and only call a function in the table with the same signature.

Linear, Unmanaged Memory

The stack lives here (not that sweet protected one discussed before). The heap lives here. The global data lives here. All right next to each other, all living in harmony.

Host Environment

Without a host environment, nobody is going to hear the tree fall. Browsers supply modules for functions such as XmlHttpRequest, eval, and document.write. Modules running in Node.js may invoke exec to execute shell commands. Non-primitive data, e.g. strings or objects, must be passed between host and WebAssembly module through linear memory, which can be accessed by both. Hopefully nobody rampages through that linear memory and overwrites the precious buffer that you are about to send to eval. Can you imagine?

Managed vs Unmanaged Data

Managed data is first class. It includes local variables, global variables, values on the evaluation stack, and return addresses. The wasm VM cares for and nurtures this data. It is truly a holy place to be.

The other scum is unmanaged data. If Hindley and Milner can’t figure you out because you’re not an integer or a float (say, you’re a string or an array), you get relegated to the linear memory. If someone cares about your address, you get relegated to the linear memory (managed data doesn’t care about addresses, why do you?). Linear memory is a hellfire dump. Don’t end up in linear memory.

This is exactly what you should think of when you see “linear memory”.

Memory Protections

All memory in the linear memory is readable and writable. Think your data needs to only be readable? Too bad.

In the civilized world, we have unmapped pages between the stack and the heap. This at least prevents your buffer overflow on the stack from infesting the heap. In linear memory, this is not the case. If you want to keep writing all the way from the stack to the heap, nobody is going to stop you. It’s the wild west out there.

But linear memory isn’t too wild. If it was really wild, it would live at randomized addresses with each program execution. It would utilize address space layout randomization. “No”, linear memory says, “we must have order”. And order it has: linear memory has no ASLR.

Attack their weaknesses

When strong, avoid them. If of high morale, depress them. Seem humble to fill them with conceit. If at ease, exhaust them. If united, separate them. Attack their weaknesses. Emerge to their surprise. - Sun Tzu

How can we possibly attack a wasm program, which keeps its return addresses protected in the holy managed data stack? Well, as the great John von Neumann once said, “code is data, and data is code”. And with that, we attack the data. We attack the linear memory.

In this example, we imagine a web application which seeks to allow you to convert and view an image in your browser entirely client side. In order to view the image, it builds up html and writes it to the dom. In doing so, it ultimately base64s your image data.

The string lives in the heap. The user supplied data is base64’d. Even with a buffer overflow XSS should be impossible. Wrong. Where are my stack canaries? Where are my unmapped pages between the stack and the heap? They don’t exist. Just overflow all the way to that image tag and make it a script tag. Hacked.

Node.js: But I Like JavaScript

Apparently some people are crazy enough to write their server side code in Node.js. And they too want to get in on this sweet wasm action.

Remember how data that cares about it’s address is relegated to the linear memory? Well, that includes all function pointers in vtables. With a quick little heap overflow into a function pointer overwrite, we can be well on our way to calling exec on user supplied data. With no ASLR and trivial heap overflows, converting your enemies web server into a bitcoin mining machine is as easy as 1 2 3. Just be sure the function pointer you overwrite has the same function “type” as exec, or in other words takes in one argument that is an integer, string, pointer or whatever else is ultimately an i32 in wasm’s extensive type system.

Real Talk: Discussing the Paper

The quality of decision is like the well-timed swoop of a falcon which enables it to strike and destroy its victim. - Sun Tzu

Appropriateness of evaluation

The evaluation was done on 9 “real world”, “in the wild” wasm programs and 17 programs from the SPEC CPU 2017 benchmark. The only justification given for the real-world programs is that they are diverse in application domain and source languages. The only justfication given in the paper for why the particular SPEC programs were chosen was that they were compute-heavy, thus matching the “intent” of webassembly. This feels flimsy; it would really be nice to see a larger data set. If there are only 9 real world wasm programs you could find in the wild, it kind of feels like the relevance of the security of webassembly applications is brought into question. Furthermore, the inclusion of SPEC feels artificial, and certainly like the programs chosen could have been cherry-picked to produce good numbers.

Are these mitigations being implemented?

The paper, in its discussion, cites multiple ongoing projects to stabilize new features into the wasm specification which would improve its security: multiple memories, reference-types, and a general memory-safety hardening project. It is good that these are being worked on, but it once again calls into question the relevance of this paper. Yes, the wasm compiler ecosystem is fragmented, but it is also very young, and the sooner these mitigations are implemented the more rapidly they will proliferate.

References

[1] https://webassembly.org/

[2] https://webassembly.org/docs/security/

This post is licensed under CC BY 4.0 by the author.