Java Cold Startup on AWS λ

TL;DR; λ is the future but Java is useless on it

May 31, 2020

This is my second newsletter. I am still writing about the things I’m currently obsessed with.

AWS λ

AWS lambda is a serverless platform. Serverless platforms are built on three fundamental ideas:

Program Isolation. Each customer programs is run in a restricted environment in its own container.
Stateless. The running program is stateless. If you want to save states, you need to use another service on top of λ such as a database.
Pay by execution time. You pay only for the time your program is running. With λ it is billed by slice of 100 ms. I.e. Not running the program is free. Running the program for 10 users is 10x more expensive than for 1.

Why would you use serverless instead of your usual VM? Because those 3 breakthrough ideas have very interesting corollaries.

All *ities disappear. Scalability, reliability and availability become your cloud provider jobs.
Cheaper. Both in terms of ops (no more waking up at 3 am because of a server crash) and actual cost. In the case of Inboxbooster, this reduced our server cost by 3 and removed the nightly crash pain.
Architecture. FaaS (function as a service) contraints a certain architecture (stateless, secure,…). These constraints are really well thoughts and useful. We don’t shoot ourselves in our foot anymore.

(source )

Actual serverless user statistics is very opaque (usually a sign that this is growing fast). Nevertheless, I found two interesting sources from Datadog and New Relic. The data below are snippets from their whitepaper. They have been written on their customers running on serverless. Keep in mind thse data are biased towards large companies. They are APM companies with minimal ARR > 5K$. Nimble startups and small companies won’t use them.

(source)

(source New Relic White Paper)

Serverless growth is impressive. It is “crossing the chasm”. Azure and Google also have their competing products and the competition is heating up. For instance, AWS Fargate and Google CloudRun are new serverless platforms built to deploy longer and bigger containers.

Java

Java is arguably an old programming language (it was launched in 1996). It has 8-10M developers. About 15M developers with the other JVM languages (e.g. Scala, Clojure).

(source https://www.baeldung.com/java-in-2019)

To provide additional contexts,

there are about 40M developers worldwide
there are 11M Javascript developers
there are 8M Python developers

N.B. All these figures are inherently imprecise and somewhat contradictory. But they are consistent relative to each others using Fermi estimate.

Java ecosystem is the one of top 3 is in the world if not the first. Considering AWS lambda attract a lot of enterprise companies, it is very odd that Java is only #3 language for AWS Lambda.

Cold Startup

The main limitation of AWS lambda (and its competitors) is the cold startup time. “When a function is started in response to an event, there may be a small amount of latency between the event and when the function runs. If your function hasn’t been used in the last 15 minutes, the latency can be as high as 5-10 seconds”. This time is used to start the container and load the environment. Cold startup happens once when the platform (AWS, GCP, Azure,…) sets up the serverless code.

This is a much bigger problem than it looks like. Containers are created only during traffic spike; the platform creates one container per user request (if none is available). What matters is the “spike variation” or the positive first derivative value of your load. E.g. you move from 0 user to 1, you pay one cold startup latency and if you move from 1 to 10, it’s 9 “cold startups”, then if you scale back to 1 container and then back 10, it’s 9 “cold startups latencies” again. This is both hard to predict, expensive and increase quite significantly your end user latency (which is one of the metric you should really care about if you care about these sort of things).

AWS has partially addressed this problem through provisioned concurrency. AWS keeps some of your workers in the background always ready to execute. But the cost is prohibitive.

(source)

The graph above shows that you need to use each worker at 60% capacity to break even with provisioned concurrency. This looks great until you realize you have no way to plan for the actual use of a worker and even if you do, you pay serverless to avoid thinking about that. Also the discrepancy between the two products is huge! You’re better off using directly your VM. It’s cheaper and easier. This is not a good solution.

Java Cold Startup Experiments

Java cold startup time is particularly bad on AWS Lambda. I experimented with 2 applications. A trivial app and a “real world app”. Results are below.

“Nothing App”

public final class App {

public static String handleRequest(String arg, Context context) { return arg;}

}

AWS needs 366 ms for cold start vs. 1.21 ms for hot start. AWS λ needs >100x more time to cold start an app doing nothing.

“Real World App”

It’s a demo app from Spring, a widely used Web framework. Cold starting it requires more than 12,000 ms but takes only 0.6 ms once prewarmed. The speedup is preternatural! >20,000x! I would like to point out the obvious: 12 seconds to send back a webpage makes it impossible to use it in production.

This explains why Java is not widely used with serverless. It’s simply too slow and too expensive to use due to cold start up.

Why? Stay tuned for our next episode!

Conclusions

Serverless is a bigger deal than it looks. It is here to stay.
Java is an important developer ecosystem with over 10 million developers and is the top 3 of the most used environment.
Java is very cumbersome on serverless. It is both expensive, impractical and slow.
This looks like a great opportunity for us!

Manycore