Brennan & Emin Talk About Computer Science

Our musings, thoughts, and more.

Deploying Play! Apps

| Comments

Coursera has standardized on Play! 2.0 for new server-side development. I’ve previously written about 2 of them: JITR and Wayland. We currently run 10+ Play! apps in production. Altogether including internal-only tools, Coursera runs 20+ different Play! services. (Why we chose Play! is a topic for a separate post.)

Like Netflix and others, we have learned that any application in production on AWS must autoscale. Autoscaling lets us gracefully handle instance failures, traffic spikes and helps keep Coursera running even when entire availability zones go down, not to mention helps us save money.

Engineers are impatient; and sometimes we need to make rapid changes to our running services in response to external and internal events. When I built our deployment infrastructure for Play! applications, I set an aggressive goal of 6 minutes from code commit to running in production.1

The aggressive latency goal helped design the system we use today. Simply due to network bandwidth constraints, we must compile our code inside EC2. It can take 2+ minutes to upload our packaged Play! applications into EC2 or S3 alone! 2 To build them, we use Jenkins to compile, test, document and package our code. By using Jenkins, we guarantee a standardized build environment; anyone in the company can build and package our Play! apps.3 Further, the aggressive time goal encourages us to automate as many of the steps as possible; computers are much faster and more reliable at executing scripts than humans.

The Deploy Process

  1. An engineer goes to Jenkins and clicks “Build Now” on a special project-deploy job. Jenkins executes play cPublish, a special Coursera-specific SBT4 task. This task compiles the code, packages the project, and creates a .deb file with all the JAR’s included. Finally, the task pushes the .deb into our S3-backed DebStore.
  2. The engineer goes to quack, a Web-UI for deploying services and creates a new cluster, filling in a few details.
  3. Quack launches a new autoscaling cluster based on a bare-bones template image. The instances boot up, download the .deb’s and install them, starting the Play! servers automatically.
  4. The engineer can now add traffic to the new instances and remove traffic from the old instances using Quack a deploy Web-UI similar to Wayland, completing the deploy.

For the remainder of this blog post, I will discuss the first part of the deploy process: packaging a Play! application. (I will cover the rest in later posts.)

Packaging Play! Projects

Play! 2.15 includes a dist task which packages up a Play! application into a zip file. The zip file includes a start script, and all the libraries in a lib folder. (See: ProductionDist) While this gets us 70% of the way to having a deployable application, there are a few issues.

  • A zip file can’t be installed by itself. Further automation is required to put files in the right places.
  • Following good security hygiene, our Play applications should run with as few privileges as possible. They should run in their own user accounts, with as few permissions as possible.6
  • All applications (Play! applications included) should start at boot automatically. As the old adage goes, “If I can’t reboot it, I can’t trust it.”

Deployment is not a new problem; it has been solved many times already. Instead of inventing our own in-house tool to install files, we use an existing tool: the Debian Package Manager dpkg.7

A PlayCour .deb

The “PlayCour”8 tooling we use packages all our Play! apps in a standard fashion. I find it best illustrated by example; say we have a Play! project called “wayland”. A user wayland is created on the system, with a home directory of /coursera/wayland. Inside the home directory is the start script and the folder lib with all of the jars inside. Finally, in the package is an upstart script that gets placed at /etc/init/wayland.conf.

.deb Advantages

Deb files themselves are a little bit more than compressed tar archives. They have very little overhead on top of what the play dist command produces. This means they are very network efficient, and can be copied quickly between machines or to Amazon S3.

Further the debian package manager dpkg is extremely fast to install. Typical times are around half a second, compared to 5, 50 or even 500 seconds for a configuration management tool to run.9 Further, you can easily and quickly upgrade versions of a service, and even uninstall it cleanly.

Finally, debian packages are very flexible. Although Play! services are the heaviest user of .deb packages, we are migrating most Coursera production services and dependencies into debian packages. Everything from libraries, to dependencies to services can be deployed using debian packages.10

.deb Disadvantages

That said, using .deb packages has some disadvantages. The biggest disadvantage is that we must use Linux/Ubuntu11. This is not too big of an issue because even though we primarily develop on Mac’s, we run our legacy stack inside a Vagrant managed virtual machine that runs Ubuntu.

Conclusion

The Play! framework is a fantastic framework to work with. One area that has been lacking for us is a great deployment technology. We have seen great success using debian packages to package and deploy our services. Subsequent posts will go over how we store the deb’s, and get them running in production quickly, as well as other common Play! tooling we use at Coursera.

Appendix

Play! Application Deployment Requirements

Below is an excerpt of the Coursera-internal design document I wrote regarding deployment tooling for Coursera Play! apps.

Goals

  1. Rapid Deployment: From code being merged on Github to running in production, we would like to see a total time of approximately 6 minutes.12 Configuration-only changes should be deployable in less than 3 minutes. Rollback to an already running cluster should complete within 10 seconds.
    • Jenkins Built: Simply uploading a deb from Coursera to AWS takes about a minute. Things must be built in the cloud. This will also help Coursera pass the bus test.
    • Automated: As much of the steps should be automated as possible. Computers can execute much faster than humans. It should ideally only be a couple clicks to make everything go.
  2. Same code everywhere: The same code should run in all of staging, test, and production. We shouldn’t have to re-package debs or re-bake AMIs. We want everything to be as identical as possible.
  3. Everything is a new deploy: Any change (code or configuration) requires a new deployment. Because we will be able to deploy quickly, we should take advantage of why we like deployments. Most important of these is fast rollback, but also the ability to ramp feature traffic, as well as a complete revision history of a running service.
Subordinate Goals
  1. Reproducibility: Everything should be completely reproducible. We should have the ability to deploy any version of the code we wish at any time with any past configuration.
  2. No command line required: We would like to make it easy for anyone to deploy. It should not require sshing in or having command line access at all.
  3. Hide the AWS Keys: It shouldn’t be necessary to access the AWS Console to be able to diagnose and deal with any issues that arise.
  4. Slowly ramp traffic: We should be able to allocate 1%, 5%, 50%, 100% of traffic to different deployments.
It-goes-without-saying goals
  • Autoscale: It must autoscale. Period. Autoscaling helps save money, while increasing reliability.
  • Discovery: It must integrate nicely with discovery. There’s no other way to get the fast rollback properties we want.
  • Logging: Everything must be logged. This is important for auditing as well as diagnosing issues with Coursera.
  • Availability: Deployment must be engineered to be completely rock solid. Anything less than 100% availability is unacceptable. It must be designed to tolerate EBS failures, RDS failures, instance failures, AZ failures and more.
Non-Goals
  • Play! Only: This infrastructure and tooling will take advantage of the features and characteristics of Play! 2.x Applications. This will not work for Python, PHP, or even other JVM-based projects. The goal of this project is to do one thing and do it well.
  • Multi-region deployment: Although the tooling should be designed with the eventual goal of multi-region deployments, its initial implementation is not designed to handle this scope of complexity.

Code snippets

I have put together a “source snapshot” sample project that includes some of the SBT tooling we add to our Coursera Play! Apps. Within Coursera, all of this tooling is packaged as an SBT plugin, and is included in all of our Play! projects with one line inside the project/plugins.sbt file. In this source snapshot, the plug-in has been copied into project/Build.scala directly.

Updates

2013-08-27

Code snippet section filled in.


  1. I set many other requirements as well. See the appendix to this post for the complete list.

  2. Although that’s more a function of our internet service at the office than anything else.

  3. We found that Play! sometimes encourages non-machine-independent build practices. Specifically, Play!’s conf folders are sometimes home to un-checked-in changes.

  4. SBT stands for Simple Build Tool, and is Play!’s included build tool.

  5. They are in the process of upgrading this for Play 2.2: https://groups.google.com/forum/#!topic/play-framework/pD694aeWrrw

  6. Should someone attack our Play! servers, they should not be able to do too much on the file system.

  7. Because we run Ubuntu everywhere, this is very natural choice.

  8. A horrible, mashed-up name.

  9. Note: this is not a fair comparison, as configuration management tools do a lot more. A configuration management tool is not the right tool for the job!

  10. For non-Play! .deb’s, we typically use FPM: https://github.com/jordansissel/fpm/wiki

  11. Any Debian-derived distro should work in theory. In practice, due to our use of Upstart we only use Ubuntu.

  12. For all but the largest projects, whose compile times may be longer.

Comments