Brennan & Emin Talk About Computer Science

Our musings, thoughts, and more.

Docker & Rust: Statically Linking Binaries for Secure Execution in Untrusted Containers

| Comments

Containers power Coursera’s 2nd generation programming assignments infrastructure. Instructors package dependencies and test cases into docker container images, which are uploaded to Coursera. Then, for every submission we map the uploaded code into a fresh container instantiated from the corresponding image and kick off the instructor’s grading script.

Because we use a shared pool of resources and schedule submissions upon a unified cluster, this architecture is an order of magnitude more cost effective and reliable compared to our first generation infrastructure. Unfortunately, security and isolation is a huge challenge in this shared environment. Containers by themselves are not secure1, and so to ensure reliable operation of our clusters, we mix in some extra secret sauce. While much of our hardening is discussed elsewhere1 2 in greater detail, we perform a few operations within containers instantiated from the instructor-provided images. In order to these functions to operate securely, we have to take a few precautions. In particular, we must use native binaries that are statically linked to ensure that we only depend on the kernel application binary interface (ABI). This post will walk through a sample attack and the corresponding defense.

When compiling a C program on Linux, by default gcc/clang produce a dynamically linked ELF file. You can inspect the output file with the file and ldd commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ubuntu@ip-10-0-9-172:~/src/hello$ cat hello.c
#include <stdio.h>

int main() {
  write(0, "Hello\n", 6);
  printf("Hello world!\n");
}
ubuntu@ip-10-0-9-172:~/src/hello$ gcc hello.c -o hello
ubuntu@ip-10-0-9-172:~/src/hello$ file hello
hello: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=06b3681087e96b5f53d3f33ee03853d061fdd887, not stripped
ubuntu@ip-10-0-9-172:~/src/hello$ ldd hello
  linux-vdso.so.1 =>  (0x00007ffe9aa53000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0619edd000)
  /lib64/ld-linux-x86-64.so.2 (0x00007f061a2a2000)
ubuntu@ip-10-0-9-172:~/src/hello$

When hello is executed, the kernel loads the binary into the address space of the new process. It then reads information in the ELF file and loads the referenced dynamic libraries from the file system into memory. The loader rewrites the addresses in all the loaded ELF files to point to the loaded addresses, and then transfers execution to the main symbol in the loaded hello ELF file. In Linux (and Darwin), the loader is configurable through a few mechanisms. The first is an environment variable named LD_PRELOAD, and the other on Linux is the file /etc/ld.so.preload. The dynamic linker inspects these, and will preferentially link symbols defined in libraries ahead of those requested in the ELF header. By overriding bindings to key library functions, the preloaded library can intercept execution of unmodified native binaries.

Coursera uses Rust in our programming assignments infrastructure because in addition to being a safe, modern, and fast language, Rust compiles down to native binaries. Thanks to Cargo and Rustup, we can easily statically link these programs such that they only depend on the kernel’s ABI.

Our example rust program will be the simple hello-world program generated by cargo new --bin hello. On Linux, Rust programs by default respect both the LD_PRELOAD environment variables and the /etc/ld.so.preload configuration file. If we compile with cargo build and then strace the resulting binary we see the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
ubuntu@ip-10-0-9-172:~/src/hello$ strace target/debug/hello
execve("target/debug/hello", ["target/debug/hello"], [/* 20 vars */]) = 0
[... some syscalls elided …]

access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)

[... lots of syscalls elided …]

write(1, "Hello, world!\n", 14Hello, world!
)         = 14
sigaltstack({ss_sp=0, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
munmap(0x7f0f141c5000, 8192)            = 0
exit_group(0)                           = ?
+++ exited with 0 +++
ubuntu@ip-10-0-9-172:~/src/hello$

The println! macro in rust eventually makes its way to the libc::write function call. We can construct a preload ELF that will intercept this call:

1
2
3
4
5
6
7
8
9
10
11
12
13
#define _GNU_SOURCE
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>

int write(int file, const void* buffer, size_t count) {
  static int (*real_write)(int file, const void* buffer, size_t count) = NULL;
  if (!real_write) {
    real_write = dlsym(RTLD_NEXT, "write");
  }
  printf("intercepted!!!\n");
  return real_write(file, buffer, count);
}

After compiling, we can run this as follows:

1
2
3
ubuntu@ip-10-0-9-172:~/src/hello$ LD_PRELOAD=`pwd`/preload.so target/debug/hello
intercepted!!!
Hello, world!

A malicious grading container could hook into the execution of our tools by setting the /etc/ld.so.preload file (or even by inserting a malicious libc onto the filesystem). We can build such a container as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
ubuntu@ip-10-0-9-172:~/src/hello$ cat Dockerfile.dynamic-ubuntu
FROM ubuntu:latest

ADD target/debug/hello /

ENTRYPOINT ["/hello"]
ubuntu@ip-10-0-9-172:~/src/hello$ cat Dockerfile.dynamic-intercepted
FROM ubuntu:latest

ADD target/debug/hello /
ADD preload.so /preload.so
ADD load.txt /etc/ld.so.preload

ENTRYPOINT ["/hello"]
ubuntu@ip-10-0-9-172:~/src/hello$ cat load.txt
/preload.so
ubuntu@ip-10-0-9-172:~/src/hello$ docker run hello-dynamic-intercepted
intercepted!!!
Hello, world!

To build static binaries in Rust, we instead use the musl target triple. Using rustup we can install it as follows: rustup target add x86_64-unknown-linux-musl. To build our binary, we execute: cargo build --target x86_64-unknown-linux-musl. When we run this statically built binary with the LD_PRELOAD flag set, no method interception occurs:

1
2
ubuntu@ip-10-0-9-172:~/src/hello$ LD_PRELOAD=`pwd`/preload.o target/x86_64-unknown-linux-musl/debug/hello
Hello, world!

We can even run this under strace and we can confirm that it makes no syscalls that read from the local filesystem2 (in addition to seeing substantially fewer syscalls than the default gnu-libc implementation).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ubuntu@ip-10-0-9-172:~/src/hello$ strace target/x86_64-unknown-linux-musl/debug/hello
execve("target/x86_64-unknown-linux-musl/debug/hello", ["target/x86_64-unknown-linux-musl"...], [/* 20 vars */]) = 0
mmap(NULL, 608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cb84b9000
arch_prctl(ARCH_SET_FS, 0x7f9cb84b9110) = 0
set_tid_address(0x7f9cb84b9148)         = 2462
readlink("/etc/malloc.conf", 0x7ffed7a337e0, 4096) = -1 ENOENT (No such file or directory)
brk(0)                                  = 0x1c27000
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cb82b9000
munmap(0x7f9cb82b9000, 2097152)         = 0
mmap(NULL, 4190208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cb80ba000
munmap(0x7f9cb80ba000, 1335296)         = 0
munmap(0x7f9cb8400000, 757760)          = 0
sched_getaffinity(0, 128, {3, 0, 0, 0}) = 32
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cb8000000
rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_RESTART, 0x44889a}, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x408a80, [], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x44889a}, NULL, 8) = 0
rt_sigaction(SIGBUS, {0x408a80, [], SA_RESTORER|SA_STACK|SA_SIGINFO, 0x44889a}, NULL, 8) = 0
sigaltstack(NULL, {ss_sp=0, ss_flags=SS_DISABLE, ss_size=0}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9cb84b7000
sigaltstack({ss_sp=0x7f9cb84b7000, ss_flags=0, ss_size=8192}, NULL) = 0
write(1, "Hello, world!\n", 14Hello, world!
)         = 14
sigaltstack({ss_sp=0, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
munmap(0x7f9cb84b7000, 8192)            = 0
exit_group(0)                           = ?
+++ exited with 0 +++
ubuntu@ip-10-0-9-172:~/src/hello$

In this way, we can ensure consistent execution without making any assumptions on the filesystem of a container. Thank you to the following resources:


  1. Although they are getting much better!

  2. The sharp eyed amongst you will notice a readlink(“/etc/malloc.conf”, ...). This only configures flags for the memory allocator, and cannot intercept program execution. See the malloc.conf man page for further background.

Speaking at Play! Meetup

| Comments

Yesterday I spoke at a Play! meetup hosted by LinkedIn. Feel free to check out the slides and the live stream video. (I am the first presenter.)

Special thanks to Nick Dellamaggiore for encouraging me to speak, and to Dan Chia for helping with the slides. Thank you to LinkedIn for hosting, and 42 for organizing!

A More Object Oriented Glsl

| Comments

Introduction

Computer graphics have come a long way from the days of 8-bit sprites. Spurred by developments in hardware and rendering techniques, computer-rendered images that border on photography are becoming more and more commonplace in both real-time and pre-rendered applications. The tools we use to write graphics software, however, have largely remained resistant to this trend of large-scale improvement. The paradigms we use to communicate with our graphics hardware has not kept-up with the evolving body of knowledge pertaining to software development practices.

Some would argue that it is the fixed-pipeline, state machine nature of our current graphics frameworks that make their abstractions at least appropriate if not necessary. After all, you have no real control over the flow of processing of your graphics data (outside of shaders), but are instead tweaking knobs on the graphics machine that is turning your vertices and related data into beautiful imagery. The way the data is processed, however, should have no bearing on how much boilerplate a programmer has to endure, nor how well-protected they are from themselves. After all, the point of abstraction is to hide the underlying machinery under a much friendlier disguise.

Towards this end, I have been working on a tool that wraps around communication with GLSL shaders to make the process of using them much more object oriented and thus a little more robust to error and a lot more amenable to quick iteration.

Deploying Play! Apps

| Comments

Coursera has standardized on Play! 2.0 for new server-side development. I’ve previously written about 2 of them: JITR and Wayland. We currently run 10+ Play! apps in production. Altogether including internal-only tools, Coursera runs 20+ different Play! services. (Why we chose Play! is a topic for a separate post.)

Like Netflix and others, we have learned that any application in production on AWS must autoscale. Autoscaling lets us gracefully handle instance failures, traffic spikes and helps keep Coursera running even when entire availability zones go down, not to mention helps us save money.

Engineers are impatient; and sometimes we need to make rapid changes to our running services in response to external and internal events. When I built our deployment infrastructure for Play! applications, I set an aggressive goal of 6 minutes from code commit to running in production.1

Wayland: Easy, Fast Reliable Deployment for Legacy Infrastructure

| Comments

The Beginning

We work hard at Coursera; we have new features and bug fixes we want to release every day. When I first joined Coursera, pushing out new code involved logging in to each server, removing traffic, updating to the latest code and finally re-adding traffic. We knew a manual process would quickly limit our ability to provide the best educational platform, and so I embarked on a process to completely automate it.

JITR: The Just-In-Time Renderer

| Comments

As part of the infrastructure team at Coursera, I build some nifty systems and tools. One of them is called JITR. I hope to share more of these tools and experiences here on this blog.

A Modern Stack

Coursera is building a global learning platform. To provide a responsive and interactive learning environment on the web, we employ state of the art technologies. We build rich client-side apps using backbone.js, underscore.js and AMD (require.js). Unfortunately, certain components that make the web the web have not caught up with the rapid pace of innovation. Link fetchers (such as those that power open graph systems on social networks) or web crawlers (programs that catalogue the web) generally do not understand javascript and rich AJAX applications. Because these systems are important to the social web, we have had to build tools to support them.