AWS Lambda has stamped a big DEPRECATED on containers – Welcome to “Serverless Superheroes”! In this space, I chat with the toolmakers, innovators, and developers who are navigating the brave new world of “serverless” cloud applications.
In this edition, I chatted with Steven Faulkner, a senior software engineer at LinkedIn and the former director of engineering at Bustle. The following interview has been edited and condensed for clarity.
Forrest Brazeal: At Bustle, your previous company, I heard you cut your hosting costs by about forty percent when you switched to serverless. Can you speak to where all that money was going before, and how you were able to make that type of cost improvement?
Steven Faulkner: I believe 40% is where it landed. The initial results were even better than that. We had one service that was costing about $2500 a month and it went down to about $500 a month on Lambda.
Bustle is a media company — it’s got a lot of content, it’s got a lot of viral, spiky traffic — and so keeping up with that was not always the easiest thing. We took advantage of EC2 auto-scaling, and that worked … except when it didn’t. But when we moved to Lambda — not only did we save a lot of money, just because Bustle’s traffic is basically half at nighttime what it is during the day — we saw that serverless solved all these scaling headaches automatically.
On the flip side, did you find any unexpected cost increases with serverless?
There are definitely things that cost more or could be done cheaper not on serverless. When I was at Bustle they were looking at some stuff around data pipelines and settled on not using serverless for that at all, because it would be way too expensive to go through Lambda.
Ultimately, although hosting cost was an interesting thing out of the gate for us, it quickly became a relative non-factor in our move to serverless. It was saving us money, and that was cool, but the draw of serverless really became more about the velocity with which our team could develop and deploy these applications.
At Bustle, we only have ever had one part-time “ops” person. With serverless, those responsibilities get diffused across our team, and that allowed us all to focus more on the application and less on how to get it deployed.
Any of us who’ve been doing serverless for a while know that the promise of “NoOps” may sound great, but the reality is that all systems need care and feeding, even ones you have little control over. How did your team keep your serverless applications running smoothly in production?
I am also not a fan of the term “NoOps”; it’s a misnomer and misleading for people. Definitely out of the gate with serverless, we spent time answering the question: “How do we know what’s going on inside this system?”
IOPipe was just getting off the ground at that time, and so we were one of their very first customers. We were using IOPipe to get some observability, then CloudWatch sort of got better, and X-Ray came into the picture which made things a little bit better still. Since then Bustle also built a bunch of tooling that takes all of the Lambda logs and data and does some transformations — scrubs it a little bit — and sends it to places like DataDog or to Scalyr for analysis, searching, metrics and reporting.
But I’m not gonna lie, I still don’t think it’s super great. It got to the point where it was workable and we could operate and not feel like we were always missing out on what was actually going on, but there’s a lot of room for improvement.
Another common serverless pain point is local development and debugging. How did you handle that?
I wrote a framework called Shep that Bustle still uses to deploy all of our production applications, and it handles the local development piece. It allows you to develop a NodeJS application locally and then deploy it to Lambda. It could do environment variables before Lambda had environment variables, and have some sanity around versioning and using webpack to bundle. All the the stuff that you don’t really want the everyday developer to have to worry about.
I built Shep in my first couple of months at Bustle, and since then, the Serverless Framework has gotten better. SAM has gotten better. The whole entire ecosystem has leveled up. If I was doing it today I probably wouldn’t need to write Shep. But at the time, that’s definitely when we had to do.
You’re putting your finger on an interesting reality with the serverless space, which is: it’s evolving so fast that it’s easy to create a lot of tooling and glue code that becomes obsolete very quickly. Did you find this to be true?
That’s extremely fair to say. I had a little Twitter thread around this a couple months ago, having a bit of a realization myself that Shep is not the way I would do deployments anymore. When AWS releases their own tooling, it always seems to start out pretty bad, so the temptation is to fill in those gaps with your own tool.
But AWS services change and get better at a very rapid rate. So I think the lesson I learned is lean on AWS as much as possible, or build on top of their foundation and make it pluggable in a way that you can just revert to the AWS tooling when it gets better.
Honestly, I don’t envy a lot of the people who sliced their piece of the serverless pie based on some tool they’ve built. I don’t think that’s necessarily a long term sustainable thing.
As I talk to developers and sysadmins, I feel like I encounter a lot of rage about serverless as a concept. People always want to tell me the three reasons why it would never work for them. Why do you think this concept inspires so much animosity and how do you try to change hearts and minds on this?
A big part of it is that we are deprecating so many things at one time. It does feel like a very big step to me compared to something like containers. Kelsey Hightower said something like this at one point: containers enable you to take the existing paradigm and move it forward, whereas serverless is an entirely new paradigm.
And so all these things that people have invented and invested time and money and resources in are just going away, and that’s traumatic, that’s painful. It won’t happen overnight, but anytime you make something that makes people feel like what they’ve maybe spent the last 10 years doing is obsolete, it’s hard. I don’t really know if I have a good way to fix that.
My goal with serverless was building things faster. I’m a product developer; that’s my background, that’s what I like to do. I want to make cool things happen in the world, and serverless allows me to do that better and faster than I can otherwise. So when somebody comes to me and says “I’m upset that this old way of doing things is going away”, it’s hard for me to sympathize.
It sounds like you’re making the point that serverless as a movement is more about business value than it is about technology.
Exactly! But the world is a big tent and there’s room for all kinds of stuff. I see this movement around OpenFaaS and the various Functions as a Service on Kubernetes and I don’t have a particular use for those things, but I can see businesses where they do, and if it helps get people transitioned over to serverless, that’s great.
So what is your definition of serverless, then?
I always joke that “cloud native” would have been a much better term for serverless, but unfortunately that was already taken. I think serverless is really about the managed services. Like, who is responsible for owning whether this thing that my application depends on stays up or not? And functions as a service is just a small piece of that.
The way I describe it is: functions as a service are cloud glue. So if I’m building a model airplane, well, the glue is a necessary part of that process, but it’s not the important part. Nobody looks at your model airplane and says: “Wow, that’s amazing glue you have there.” It’s all about how you craft something that works with all these parts together, and FaaS enables that.
And, as Joe Emison has pointed out, you’re not just limited to one cloud provider’s services, either. I’m a big user of Algolia with AWS. I love using Algolia with Firebase, or Netlify. Serverless is about taking these pieces and gluing them together. Then it’s up to the service provider to really just do their job well. And over time hopefully the providers are doing more and more.
We’re seeing that serverless mindset eat all of these different parts of the stack. Functions as a service was really a critical bit in order to accelerate the process. The next big piece is the database. We’re gonna see a lot of innovation there in the next year. FaunaDB is doing some cool stuff in that area, as is CosmosDB. I believe there is also a missing piece of the market for a Redis-style serverless offering, something that maybe even speaks Redis commands but under the hood is automatically distributed and scalable.
What is a legitimate barrier to companies that are looking to adopt serverless at this point?
Probably the biggest is: how do you deal with the migration of legacy things? At Bustle we ended up mostly re-architecting our entire platform around serverless, and so that’s one option, but certainly not available to everybody. But even then, the first time we launched a serverless service, we brought down all of our Redis instances — because Lambda spun up all these containers and we hit connection limits that you would never expect to hit in a normal app.
So if you’ve got something sitting on a mainframe somewhere that is used to only having 20 connections and then you moved over some upstream service to Lambda and suddenly it has 10,000 connections instead of 20. You’ve got a problem. If you’ve bought into service-oriented architecture as a whole over the last four or five years, then you might have a better time, because you can say “Well, all these things do is talk to each other via an API, so we can replace a single service with serverless functions.”
Any other emerging serverless trends that interest you?
We’ve solved a lot of the easy, low-hanging fruit problems with serverless at this point. Like how you do environment variables, or how you’re gonna structure a repository and enable developers to quickly write these functions, We’re starting to establish some really good best practices.
What’ll happen next is we’ll get more iteration around architecture. How do I glue these four services together, and how do the Lambda functions look that connect them? We don’t yet have the Rails of serverless — something that doesn’t necessarily expose that it’s actually a Lambda function under the hood. Maybe it allows you to write a bunch of functions in one file that all talk to each other, and then use something like webpack that splits those functions and deploys them in a way that makes sense for your application.
We could even respond to that at runtime. You could have an application that’s actually looking at what’s happening in the code and saying: “Wow this one part of your code is taking a long time to run; we should make that its own Lambda function and we should automatically deploy that and set up this SNS trigger for you.” That’s all very pie in the sky, but I think we’re not that far off from having these tools.
Because really, at the end of the day, as a developer I don’t care about Lambda, right? I mean, I have to care right now because it’s the layer in which I work, but if I can move one layer up where I’m just writing business logic and the code gets split up appropriately, that’s real magic.
Forrest Brazeal is a cloud architect and serverless community advocate at Trek10. He writes the Serverless Superheroes series and draws the ‘FaaS and Furious’ cartoon series at A Cloud Guru. If you have a serverless story to tell, please don’t hesitate to let him know.
2018 is set to be a very exciting year for cloud computing. In the fourth financial quarter of 2017, Amazon, SAP, Microsoft, IBM, Salesforce, Oracle, and Google combined had over $22 billion in their revenue from cloud services. Cloud services will only get bigger in 2018. It’s easy to understand why businesses love the cloud. It’s easier and more affordable to use third-party cloud services than for every enterprise to have to maintain their own datacenters on their own premises.
2017 was a huge year for data breaches. Even laypeople to the cybersecurity world heard about September’s Equifax breach because it affected at least 143 million ordinary people. Breaches frequently happen to cloud data, as well.
In May 2017, a major data breach that hit OneLogin was discovered. OneLogin provides identity management and single sign-on capabilities for the cloud services of over 2,000 companies worldwide.
“Today we detected unauthorized access to OneLogin data in our US data region. We have since blocked this unauthorized access, reported the matter to law enforcement, and are working with an independent security firm to determine how the unauthorized access happened and verify the extent of the impact of this incident. We want our customers to know that the trust they have placed in us is paramount,” said OneLogin CISO Alvaro Hoyos.
Sometimes data lost from cloud servers is not due to cyber attack. Non-malicious causes of data loss include natural disasters like floods and earthquakes and simple human error, such as when a cloud administrator accidentally deletes files. Threats to your cloud data don’t always look like clever kids wearing hoodies. It’s easy to underestimate the risk of something bad happening to your data due to an innocent mistake.
One of the keys to mitigating the non-malicious data loss threat is to maintain lots of backups at physical sites at different geographic locations.
3. Insider threats
Insider threats to cloud security are also underestimated. Most employees are trustworthy, but a rogue cloud service employee has a lot of access that an outside cyber attacker would have to work much harder to acquire.
From a whitepaper by security researchers William R Claycomb and Alex Nicoll:
“Insider threats are a persistent and increasing problem. Cloud computing services provide a resource for organizations to improve business efficiency, but also expose new possibilities for insider attacks. Fortunately, it appears that few, if any, rogue administrator attacks have been successful within cloud service providers, but insiders continue to abuse organizational trust in other ways, such as using cloud services to carry out attacks. Organizations should be aware of vulnerabilities exposed by the use of cloud services and mindful of the availability of cloud services to employees within the organization. The good news is that existing data protection techniques can be effective, if diligently and carefully applied.”
4. Denial of Service attacks
Denial of service (DoS) attacks are pretty simple for cyber attackers to execute, especially if they have control of a botnet. Also, DDoS-as-a-service is growing in popularity on the Dark Web. Now attackers don’t need know-how and their own bots; all they have to do is transfer some of their cryptocurrency in order to buy a Dark Web service.
“Ordering a DDoS attack is usually done using a full-fledged web service, eliminating the need for direct contact between the organizer and the customer. The majority of offers that we came across left links to these resources rather than contact details. Customers can use them to make payments, get reports on work done or utilize additional services. In fact, the functionality of these web services looks similar to that offered by legal services.”
An effective DDoS attack on a cloud service gives a cyber attacker the time they need to execute other types of cyber attacks without getting caught.
5. Spectre and Meltdown
This is a new addition to the list of known cloud security threats for 2018. The Meltdown and Spectre speculative execution vulnerabilities also affect CPUs that are used by cloud services. Spectre is especially difficult to patch.
“Both Spectre and Meltdown permit side-channel attacks because they break down the isolation between applications. An attacker that is able to access a system through unprivileged log in can read information from the kernel, or attackers can read the host kernel if they are a root user on a guest virtual machine (VM).
This is a huge issue for cloud service providers. While patches are becoming available, they only make it harder to execute an attack. The patches might also degrade performance, so some businesses might choose to leave their systems unpatched. The CERT Advisory is recommending the replacement of all affected processors—tough to do when replacements don’t yet exist.”
6. Insecure APIs
Application Programming Interfaces are important software components for cloud services. In many cloud systems, APIs are the only facets outside of the trusted organizational boundary with a public IP address. Exploiting a cloud API gives cyber attackers considerable access to your cloud applications. This is a huge problem!
Cloud APIs represent a public front door to your applications. Secure them very carefully.
Gartner recently reported that by 2020, the “cloud shift” will affect more than $1 trillion in IT spending.
The shift comes from the confluence of IT spending on enterprise software, data center systems, and IT services all moving to the cloud.
With this enormous shift and change of practices comes a financial risk that is very real: your organization may be spending money on services you are not actually using. In other words, wasting money.
How big is the waste problem, exactly?
The 2016 Cloud Market
While Gartner’s $1 trillion number refers to the next 5 years, let’s take a step back and look just at the size of the market in 2016, where we can more easily predict spending habits.
The size of the 2016 cloud market, from that same Gartner study, is about $734 billion. Of that, $203.9 billion is spent on public cloud.
Public cloud spend is spread across a variety of application services, management and security services, and more (BPaaS, SaaS, PaaS, etc.) – all of which have their own sources of waste. In this post, let’s focus on the portion for which wasted spend is easiest to quantify: cloud infrastructure services (IaaS).
Breaking down IaaS Spending
Within the $22.4 billion spent on IaaS, about 2/3 of spending is on computer resources (rather than database or storage). From a recent survey we held – bolstered by our daily conversations with cloud users – we learned that about half of these compute resources are used for non-production purposes: that is, development, staging, testing, QA, and other behind-the-scenes work. The majority of servers used for these functions do not need to run 24 hours a day, 7 days a week. In fact, they’re generally only needed for a 40-hour workweek at most (even this assumes maximum efficiency with developers accessing these servers during their entire workdays).
Since most compute infrastructure is sold by the hour, that means that for the other 128 hours of the week, you’re paying for time you’re not using. Ouch.
All You Need to Do is Turn Out the Lights
A huge portion of IT spending could be eliminated simply by “turning out the lights” – that is, by stopping hourly servers when they are not needed, so you only pay for the hours you’re actually using. Luckily, this does not have to be a manual process. You can automatically schedule off times for your servers, to ensure they’re always off when you don’t need them (and to save you time!)