Happy to contribute to this excellent series from NPI. You won’t find a better overview of cloud service procurement anywhere, with detailed comparisons of services from Amazon, Microsoft, Google, IBM and Oracle: Read on!
Treat cloud service measurement and evaluation as a center of excellence. If you don’t, it could cost you a lot of money down the road. Read more from InfoWorld.
I remember the first time I encountered the term “Software as a Service,” and it was longer ago than you might think. I was running a red-hot ASP start-up in Houston called ebaseOne. We’d taken a tiny local software VAR, changed the name, secured investment capital, hired some infrastructure experts, built a state of the art operations center, signed a co-location deal for a brand new leading-edge data center and leveraged the existing staff’s application expertise to start offering application access to customers for a single, monthly per-user subscription fee with almost no cost up front. We didn’t develop software – we just hosted a variety of products, on the assumption that customers wouldn’t want to deal with hundreds of vendors for their software subscription needs – they’d want a one-stop shop that they could get all their applications from. For products that weren’t web-based, we used technologies from Citrix and Marimba to distribute updates and manage the customer desktop. Cisco, among other partners, thought we were visionaries. A year after we started we’d gone through a few million bucks in funding, but our market capitalization was over a half billion. Then the internet bubble burst at exactly the wrong moment, and the next round of funding never came. Ouch.
We weren’t the only ones. Another ASP getting attention at that time was US Internetworking, or USi for short. If memory serves, at some point while we were ramping up, USi came up with a brilliant tag-line: Software as a Service. It wasn’t a trademark or anything – just a term they started using when speaking to investors. We started using it too. So did another little company that called itself an ASP at the time but developed its own software: Salesforce.Com. This was in 1999, long before cloud computing (it wasn’t until the end of 2004 that AWS was even available for public use, and it didn’t really catch fire until much later in the decade).
When cloud did first begin getting everyone’s attention there was a lot of head-scratching about the actual definition of it. Many people just thought it was a way to rent virtualized servers. The consulting firm I went to after my start-up looked to me for a definition in 2010. Among other things, I told them this about SaaS: “SaaS can be delivered via cloud computing, but it does not have to be.” And that was true. All you needed to do for the SaaS label to apply to your service was to host applications and charge for access per user per month instead of selling licenses. To a great extent, and to my complete surprise, that is still true today. Now I’m going to talk about why.
Shortly after that, much of what I had said about SaaS became irrelevant. NIST had published their official definition, and the industry adopted it pervasively. In that definition, SaaS was defined as one of the three cloud service models (IaaS, PaaS and SaaS). The uncomfortable fact that SaaS had already existed for a long time and that many offerings were not very cloud-like was simply ignored. SaaS was now cloud, by definition, and all SaaS vendors instantly became Cloud Service Providers, without so much as a “hey, wait a minute.” Customers and pundits could have asked if SaaS services even sat on top of elastic cloud infrastructure, but they didn’t. A lot of SaaS applications still don’t. I believe it is this de facto SaaS = cloud equivalence that has distracted customers from what could really be achieved if the cloud model were fully applied to application services.
A big thing that SaaS products don’t offer today, but could, is real usage-based pricing. Every SaaS provider will tell you that “you pay only for what you use,” because you only pay for the users that are allowed to use software. Time to take the red pill, Neo. That’s really all you were paying for before cloud came along, and it’s called user-based pricing. What happens when a user is out sick and not using the software? Oh yeah, you still pay. SaaS providers offer the same thing today that software companies have been offering as long as I can remember: an up-front fee followed by a periodic fee for updates and support. Sure, your IT folks don’t have to do the upgrades – yay – but financially the only difference is that the bill has changed to monthly instead of annual, if you’re lucky. Many SaaS customers find that signing a deal will still lock them into a multi-year commitment for a certain minimum number of users. That’s nothing new, and it’s in complete opposition to one of the basic philosophies of cloud, which is to pay only for use.
Now, if SaaS services were to measure minutes (or seconds) of the time someone actually has the app active on their desktop and billed you based on that, we would really have something different. The number of users wouldn’t be directly relevant, and you really would be paying only for what you used. If your users started using the software less and less, your bill would automatically decrease in step with that. No more need for closely managing the software portfolio to make sure you aren’t over-licensed. Wouldn’t that be great? I think it would. Over-licensing due to shelf-ware and functional duplication between products can represent millions of dollars in annual software costs for a large enterprise. Start-ups could start out with enterprise level apps and not worry about their solutions hitting a scalability wall. Enterprises heading into a downturn wouldn’t be stuck paying for software they don’t need since charges would go down in lock-step with usage. This is what I call “pay as you grow / save as you shrink.” Want to migrate away from a product and replace it with a different one? Just stop using it whenever you’re ready.
And the data from metering could eventually be used in all kinds of constructive ways. You could see how a change in features, functions, processes, staff, work environment, etc. affect time spent in the application, for example. You could see changes in usage patterns in near real-time and investigate root causes. Of course, you’d need to keep pressure on your provider to make sure that changes to the application didn’t lower productivity, thus increasing their revenue, but that’s all part of the evaluation.
We didn’t implement true usage-based pricing at ebaseOne, but we were working on it. We invested in the high-end monitoring and billing systems we would need to bill our customers for minutes of usage, got them up and running and did the training. This is similar to what long-distance phone companies once did in order to bill for phone calls. Many SaaS providers today don’t have all the required systems in place yet, but only because they don’t need to. They don’t relish the idea of an unpredictable revenue stream, where there are no financial barriers keeping their customers from switching to the competition. If they’re public they have to set expectations for the street every quarter and then meet those expectations, so predictability is good. And yet, Amazon is in exactly the same situation with the on-demand services in AWS, and they’re doing just fine.
The real reason that SaaS providers haven’t offered this is that the market has not demanded it. Yet. Software companies don’t like change and will have to be forced into it by competition. Start-ups – this is your opportunity.
Looking back in time, I mentally divide articles on cloud security into two general “waves.” The first, from the early days of cloud, are those that looked at the shared, public nature of what Amazon was selling and said “the cloud is not yet secure!” In addition to the very idea of shared infrastructure going against everything the security industry was trying to achieve, there were also some quite valid concerns that potential customers needed to know about. Server-based network switching could theoretically be exploited to gain access to VMs that weren’t assigned to you, for example. Data stored in the cloud was not always securely wiped off before the storage space was assigned to new customers, and that made headlines when customers found they could recover information from previous users of the space. CSOs were legitimately concerned about jumping into cloud services too quickly, often citing the need to comply with government or industry security standards like PCI DSS, HIPAA and FedRAMP. The takeaway? Cloud was less secure than your own data center.
The second wave of cloud security articles came after the leading cloud providers began publishing long lists of standards that their customers had, in fact, complied with, along with a commitment to support that compliance going forward. The early technical and procedural difficulties were now in the past and relegated to a bin labeled “growing pains.” At that point, the narrative morphed into something new. “All one has to do is look at what the public cloud behemoths spend on security to know that they can achieve far more than your own, sadly underfunded security operation could ever hope to do,” everyone seemed to say. That narrative is still with us today, showing up in security articles and as casual mentions in more general guidance on cloud services. The takeaway? Cloud is more secure than your own data center.
Both of these narratives are false.
They are false because they are dramatic oversimplifications. Worse, they are actually damaging the quality of critical business decisions being made right now. Here’s what’s wrong with them:
- They don’t specify what is meant by “cloud.” Amazon, Microsoft, Google, IBM and all the rest do not secure their cloud infrastructure in exactly the same way. Securing technology assets is way too complex, and security designs are way too proprietary, for that to happen. Data centers are not located in the same places, which means vulnerabilities to disasters or physical incursions vary, as do the political, economic and legal risks associated with the various jurisdictions that cover them. Is that public cloud or private cloud? Because things like auditability and penetration testing are often far more limited when you’re using infrastructure that might be shared with other customers. On-site or off-site? Because if the assets are off-site you’ve almost certainly increased your attack surface by creating an interface with an external entity that has to be trusted, and that entity also has interfaces between your service infrastructure and its corporate network. SaaS providers often use 3rd parties to host their infrastructure, which means you may have added multiple interfaces to your attack surface. Major cloud providers do have world-class security implementations. They have the money to invest, and they know the risk to their business represented by the threat of a well-publicized breach. They better, because they’re also more enticing targets than you would be on your own. Data thieves would love to find a way to get into all the data stored on a large provider’s infrastructure. When thinking about the potential for a criminal act, you should never fail to consider motive.
- They don’t specify what’s meant by “your own data center.” Who are you? If you’re a small business that has your data center in someone’s converted office and an IT staff that you can count on your fingers, then yeah, large cloud providers probably have better security than you do… as far as it goes. Just remember that OS patches and any interfaces outside the providers’ facilities are usually your responsibility, like the management console that you use to manage cloud resources. What happens when the administrators that can access that console leave the company… in a bad mood… because they quit or were fired? Can all of your backups be deleted from that console? I hope not – don’t forget what happened to Code Spaces. Now, if you’re a large multinational with an IT department as big as a medium-sized company that prioritizes security (I’m looking at you, banks and defense contractors), the answer could be different. You need to look very hard at the provider’s security to determine if it’s a step up, a step down or a lateral move with balancing pros and cons. Do you have unique security needs that can’t be met by a service that’s designed for a horizontal market? You might. Do you have vulnerabilities in your own security architecture that haven’t been closed as the exploits continually multiply? You might. The comparison you’ll need to make will depend in part on who the provider is and on which service you’re considering.
- Just asking the question can be misleading. It gives customers the impression that when they move to the cloud they are replacing all of their own security with the provider’s security, and for basic IaaS that just isn’t the case. It’s true that the provider has taken on physical data center security, network boundary protection and monitoring, the hypervisor and other security concerns specific to the provider’s infrastructure. They’ve also put in place all of the encryption and configurations necessary for a cloud service to actually operate But you are still doing your own patching. You are still managing the guest OS and all the utilities and applications. You are still doing your own firewall configuration too, even though that firewall belongs to the provider. Many IaaS customers are still surprised when they find out just how little of their security responsibility has actually been taken on by the provider. Now, if it’s SaaS or a “managed” service like database, the picture changes. The provider is likely to take responsibility for much more infrastructure security, and some of your attention can shift upwards to identity management and authentication. It’s critical to pay attention to what the provider is actually taking responsibility for.
We’ve only scratched the surface of a very complex topic here, but the next time you see an article that says “cloud security is worse” or “cloud security is better,” I hope that you will give it all the attention it deserves, which is none. The whole reason that The Cloud Service Evaluation Handbook has an entire chapter on security is that the security that is best for you depends on your needs, your capabilities and on the characteristics of each particular service. You’ll need to do an actual evaluation, with participation from your security team, to find out what you’re gaining or losing when you move to cloud.
In case you missed the first two laws of cloud cost optimization, here they are:
- Know Your Application, and
- Do a Thorough Service Evaluation
But let’s say you’re following those already. You’ve got a good grasp of the application’s workload profile, its needs for disaster recovery, security, response time, elasticity, interoperability and more (see The Cloud Service Evaluation Handbook™), or at least you have people on the team that do, and all of those people contributed to the evaluation of the service that you finally chose. You’ve definitely gotten off to the right start, but costs can still spiral out of control even if they were optimal on day one. The key to making sure that doesn’t happen is taking a variety of steps that fall under the heading of good Demand Management. Oh sure, you might think you’ve got that covered if you’ve been doing IT Demand Management for your private, non-cloud assets. Let’s say you already require strong business cases for new acquisitions, you monitor and improve the accuracy of those business cases, you make sure that old equipment and software licenses get reused to meet new requirements or get properly decommissioned, making sure they don’t continue to generate unnecessary costs in the form of management fees, software maintenance, etc. If you’re doing those things, pat yourself on the back because many organizations weren’t doing them very well even before cloud came along.
Now for the bad news: there’s a lot more to good demand management with cloud, particularly public cloud. First of all, it’s more important now than it ever was. An unused server sitting around in your own data center may not have been a big deal if you didn’t need the space for anything else. In the cloud, however, you’re paying for that unused resource every month, just throwing money straight out the window. The “usage” that cloud providers measure when they create your invoice is determined by the number and type of resources that you, the customer, have allocated to yourself. It’s not what you are actually using to do useful work at the moment, and financial people are sometimes surprised by that. Since the providers make more money when you over-allocate or forget to turn something off, you can’t expect them to do much to help you. Even when they provide tools that let you see where your issues are, it’s still up to you to use them diligently to keep your charges down. Here are a few leading practices that you can use to optimize costs. If you aren’t already using them, you could easily be overpaying by 20% to 50% or even more.
- Reassess your service decisions continually. Application workload characteristics change, and the service you chose initially was greatly influenced by what you expected those characteristics to be. You may have selected on-demand instances thinking that your application could release them most of the time. Is that what you’re seeing? Does your application even have the ability to release resources automatically, or would you benefit from a simple tool like ParkMyCloud that lets you deactivate them on a fixed schedule? Or you may have selected dedicated or reserved instances thinking that workloads would be fairly steady around the clock, and that may be true for some of them, but now that the workload has grown, perhaps some of the resources aren’t needed during off-peak hours. Would going to an on-demand scheme for some of them save money? If usage is very light and it’s the right kind of application, perhaps a FaaS solution like AWS Lambda, Azure Functions or Google Cloud Functions would now be a better choice, or even a Google “preemptible” VM.
- Reassess your configuration choices continually. For the same reasons that your service choice may no longer be optimal, your configurations are even more likely to need re-optimization as time goes on. Memory requirements, in particular, can change quickly. Performance bottlenecks will force you to go to instances with more memory if you need to, but there can also be opportunities to go to smaller memory configurations to save money. Your developers can and should be looking for ways to make your applications more efficient, and money saved on your cloud bill is the measurable payout for that, not only from charges for VMs and storage but for data transfer as well.
- Look for “zombie” resources continually. Zombies are resources that aren’t being used because they’ve become “unattached” or simply aren’t needed for running your applications. Storage volumes not attached to a compute instance are common examples, and you may be paying for the entire capacity even though you’re storing nothing on it. For IaaS, IP addresses, databases, load balancers and even VMs can all be zombies. One of the great things about cloud is the ability to try out ideas and then abandon them without making a large investment in infrastructure first, but zombies are a natural consequence of that. Also, if you aren’t currently “tagging” your resources, start now. You’ll want more tags for reporting purposes, but an Owner tag is critical for determining which resources are really zombies before you deprovision them. User accounts can also become zombies when employees leave or change jobs or when trading partners change, so this is important for SaaS as well as IaaS. Inactive accounts represent security risks as well as unneeded costs. Use your tools to detect resources with low utilization and investigate them.
- Keep doing Hierarchical Storage Management and Information Lifecycle Management, continually. They are still relevant in the cloud, perhaps even more so when the equipment is off-site than when you owned it. Cloud automation makes using lots of storage very easy, and those monthly charges are just going to keep coming until you do something to turn them off. Old snapshots may be okay to delete when you have enough newer ones. Back in the days of mainframes we used to periodically look at the last access date on every file and archive anything that went too long without being touched. In the cloud, you have numerous options on where to put your data, and some are orders of magnitude less expensive than others. Migrating data to options with lower redundancy, lower IOPS, magnetic vs. solid state, “infrequent usage” tiers, long-retrieval-time services (e.g. AWS Glacier), and, of course, deletion are all potential ways to save money.
- Look at opportunities to refresh your infrastructure, You may be thinking “Wait a minute! I thought cloud got rid of the need for us to do that!” Well, sort of. You may not have to rack and stack new boxes anymore, but you also don’t want to keep paying for your provider’s fully depreciated old equipment unless it has some unique feature that you still need. There’s a common misconception out there that the rapid price drops in the public cloud market are automatically passed on to customers, and, as we’ve discussed previously, that often isn’t true. First of all, they aren’t coming as quickly as they used to, and when they do they may only be implemented on new resource types, so if you don’t refresh the resources you’re using with the newer technology, you get zero benefit. How price changes take effect is entirely up to the provider, however, so it’s up to you to analyze pricing on every new configuration they release to see how it’s relevant to you.
Now, the good news is that there are tools to help with all this. Cost optimization tools are often incorporated in Cloud Management Platforms and help with all the things I’ve mentioned above. If you have a large cloud investment I’d expect the business case for one of them to be compelling (e.g., CloudHealth, Cloud Cruiser, Cloudyn, Cloudability, Cloudamize, Rightscale and more). CloudHealth, in particular, has been good about sharing their wisdom on this topic.
So, what’s the Third Law of Cloud Cost Optimization?
“Clean Up Your Mess, …Continually!”
Last time we covered some of the basics of AWS Lambda, so you should know what the service is, what it promises to do, and a little about how well it meets those promises. If not, check out the last post. Now let’s talk about pricing!
Lambda pricing is based on two fairly simple charges: $.20 per million executions of your function plus $0.00001667 per GB-second of processing. That means that if you specify 2 GB of RAM, the second part of the price will be twice as high as it would be if you specified only 1 GB of RAM. The good news is that the first 1 million executions of your function and 400,000 GB-sec of processing time per month are free. This is great for development, since before you’ve put your functions in production you generally need to do fewer executions than you do after the application goes live, and those test executions won’t cost you anything. This doesn’t include data transfer and storage charges that your application might incur when it uses other Amazon services, however.
So, let’s take a look at what Lambda will cost you if your executions are 50 ms, 100 ms or 200 ms in length. This isn’t completely realistic since your function probably won’t take the same time to execute every time, but it should help you to visualize approximately how your charges will behave. By the way, Microsoft’s pricing for Azure Functions is virtually identical to Lambda’s and has the same 100 ms minimum charge, so this analysis applies there as well. As I write this, Google’s Cloud Functions service is still in alpha test, and I haven’t seen any published pricing yet.
Note that the red line for 50 ms functions is completely hidden underneath the purple line for 100 ms functions due to the 100 ms minimum charge. 200 ms functions cost twice as much as 100 ms, as would anything between 100ms and 200ms. If I added 150 ms functions to this chart, the line would be hidden under the orange line for 200 ms. That’s because execution times are always rounded up to the nearest 100 ms. for purposes of billing. That’s very important to understand if you want to avoid surprises on your bill, and, personally, I’m not a big fan of it. Come on, Amazon – just offer simple per-ms pricing with no minimum duration, and charges will be a lot more predictable.
Now, what about EC2? Since Amazon assigns resources to your function in the same proportion as a general-purpose compute instance, let’s use one of those as a hypothetical comparison. An on-demand m4large with 8 GB RAM currently costs $.108 per hour, which is $.0000037 per GB-second. If you reserve the instance for 3 years and pay up-front, it only costs $.04 per hour, or $.0000013 per GB-second. Let’s just assume that your functions execute sequentially as quickly as possible for an entire month (not real-world I know, but stay with me – this is just a hypothetical), and that once you’ve filled up a month of execution time you would need an additional instance to process any more. That tells us how many executions you can process per month on an EC2 instance and gives us a comparison that looks like this:
At very high loads on your EC2 instances, which corresponds to very high numbers of executions, Lambda is 4.5 times the cost of the on-demand instance and 11.4 times the cost of the reserved instance. So it’s a terrible deal, right? Well no, not really, and here’s why:
- On-demand instances won’t work. Lambda is designed for functions that must execute rapidly after being triggered by events. To get the benefit of on-demand EC2 pricing you would have to spin up that instance each time the function was called, and that’s the assumption that underlies the orange line in the chart. Unfortunately, that would add a tremendous amount of processing overhead, and your function would be dog slow. To use EC2 at all you would need your instance to be already available and waiting for a triggering event, and that means 100% instance usage per month. Reserved instances are always cheaper than on-demand at 100% usage.
- Your functions are very likely not going to run 100% of the time, and that’s the assumption that appears to be built into Lambda’s pricing. In this hypothetical scenario, Lambda’s sweet spot is on the left side of the chart, below 3 million function executions per month. Your mileage will vary a bit because your functions won’t take exactly this much time to execute (the further they are from a multiple of 100 ms, the less attractive Lambda will be), and you’ll want to be operating in the range where the free executions that come with Lambda still have a noticeable impact on your bill. That’s also where a reserved instance would still be very underutilized. If your traffic starts pushing you to the right side of the chart, consider using reserved instances to save money. Just make sure you take the next point into account as well.
- Lambda and EC2 can’t be directly compared without any adjustments, which is why I added the note at the bottom of the chart. We have an apples and oranges issue. Lambda is far closer to being a fully managed service than EC2 is. It includes things like patching, disaster recovery, backup and pretty much all standard IT management tasks that go with having a server other than security, as I mentioned in my last post. I estimate that to be worth about twice what you get from an EC2 instance in terms of reducing your internal IT costs. So, in our example, instead of switching to a reserved instance at just under 3 million executions per month, you should probably wait until about 6 million if you care about optimizing TCO for your infrastructure long-term. And that doesn’t include savings in the development cycle. Lambda is designed to trigger your functions in response to events that occur in social media streams, stored data, monitored states and other event sources. That should mean less coding for you. It’s also designed to auto-scale by default, without making you build that intelligence into your application. I’m not sure how much time this saves, but I’d love to hear from any developers out there with experience that can provide some input.
Hopefully that will help with decision making around when to use Lambda and other FaaS offerings, and when not to. Use them for event-triggered processes that need fast response times but that aren’t going to run in such high volumes that they would justify reserving an entire instance. Test your functions so that you can generate appropriate charts like the ones above and predict your costs. If you try out FaaS knowing what to expect, on applications for which it is well suited, I believe you will be very pleased with the results.
It’s right there in Amazon’s tag line for Lambda, what I would argue is one of the most interesting cloud services they’ve released to date: “Run code without thinking about servers – pay only for the compute time you consume.” Let’s have a look at those claims, and especially let’s look at them in relation to EC2, their older and more traditional IaaS cloud offering, because, hey, hasn’t Amazon made those same claims before? If Lambda just offers the same thing as EC2, why is it a separate service? Or did EC2 not really offer those things, but Lambda really does?
Let’s start with the idea that you don’t need to think about servers. In fact, Amazon (and Microsoft and Google with similar offerings) uses the new word “serverless” to describe Lambda. That language has been rightly criticized because Lambda does still run on servers, of course. You just don’t need to specify them to use the service the way you did with EC2, since Lambda is just about running functions. No more choosing instance types from the dizzying array in the AWS catalog. You simply tell the service how much memory your function needs to run, as well as a “timeout period” after which they will terminate an execution if it is taking too long. That’s important, since it protects you from getting a huge bill if your function doesn’t work as expected. Once you select your memory requirement, Lambda allocates “proportional” CPU power, network bandwidth, and disk I/O to a Linux container for your function, and the CPU power allocated is in the same proportion as a general purpose EC2 instance. Your function also receives 500MB of non-persistent disk space in a temporary directory. If you need persistent storage, then you go outside Lambda and use another service like S3.
In short, Lambda is simply about running code, and so it removes as many of the infrastructure planning and management tasks as it possibly can, to a much greater degree than EC2. That includes handling redundancy across multiple availability zones, so you don’t need to manage disaster recovery, as well as automated backups of your code. It also includes continuous horizontal scaling, so that whatever server instances are needed to run your code as your functions are called more and more frequently are simply added to the service automatically. All of this is great for developers, because running code is all you really want to do while you’re developing – the less infrastructure hassles you have the better. Some have begun referring to Lambda and its competitors as “Function as a Service,” and I certainly find that to be much more accurate and descriptive than “serverless.”
I should mention, and this is very important, that while it’s true that Lambda removes the need to “worry” about servers, there is still one IT management task that you will need to worry about: security. Security is handled via AWS Identity and Access Management. You need to specify the IAM role required to run each Lambda function, and of course you still need to make sure your users (or other event sources) are properly authorized to have that role, and that your Lambda functions are authorized to access any required resources outside of the Lambda containers. This isn’t all that different from the security requirements around any application that you design, but you may need to think about them a little earlier in the process than you would if your development platform was all on your own premises.
Pay Only for What You Use?
EC2 charges for on-demand server instances on a per-hour basis. As we’ve discussed in previous posts, that’s not true usage-based pricing because you may not need every instance for more than an hour. If your application runs for 1 minute each hour of the day, you’ll pay for 24 hours just as if it ran continuously. Lambda pricing, in contrast, is based on sub-second metering. Amazon says “you don’t pay anything when your code isn’t running.” Unfortunately, that’s not quite true. As with EC2, Lambda has a minimum interval that you will be charged for, and it’s 100 milliseconds (ms). For context, 100 ms is a tenth of a second. I’m sure Amazon’s marketing department prefers saying 100 ms because, well, it sounds smaller, right? “But 100 ms is nothing” you say! In fact, it’s the most granular pricing we’ve seen in the cloud thus far… but it’s not nothing.
Remember that each Lambda charge is for a single execution of your function. Some functions take less time to execute than 100 ms. Some take more. If your function takes 101 ms to execute you will be charged for 200 ms. It’s just a fraction of a penny you say? Well true, but when your function executes millions of times the charges add up. This situation, where usage under the minimum interval generates profit for the provider, exists for both Lambda and EC2, and Amazon is quite aware of it. Add to this the fact that Lambda can take several times longer to execute your function the first time it’s called than it does for subsequent executions. This is called a “cold start,” and it takes longer because Lambda is getting all your resources “ready” to execute your function quickly. It’s not completely clear how long Lambda will wait after an execution before you need another cold start, but while one execution per minute may be enough to avoid cold starts completely, one every 10 minutes may not be (see this post for an example).
So, my first law of cloud cost optimization holds true for Lambda every bit as much as it did for EC2:
“Know Thy Application!”
In this case that means knowing your function. If your function’s performance is CPU or memory-bound, increasing the memory might decrease the execution time, thus giving you much better performance for the same or even a lower price than you got with less memory. If the performance bottleneck is somewhere else, then you may want to decrease the memory allocated to the function to save money if you can do it without losing performance. Optimizing your Lambda pricing is always going to be a game of choosing just enough memory for your function but not too much.
My conclusion is that Lambda, along with similar FaaS offerings from other providers, does require you to think less about servers than EC2 does. It also has more granular pricing, but we’re still not quite paying only for “what we use.” Now that we’ve covered the basics, next time we’ll look at Lambda pricing in detail (complete with pretty charts), how it compares to EC2, and what that means for when you should use Lambda… and when you shouldn’t!
In response to a few nominal price drops from public IaaS providers of late, as well as a few pundits noticing that those drops have become less frequent than they once were, I’ve seen more than one piece questioning whether the providers can keep dropping their prices. They say things like “margins are already thin” and “the cost of the labor to support public cloud isn’t going down.” There appears to be a narrative under construction here that the cost to provide you with, say, an hour of access to a compute instance or a GB of disk space, is fixed for the provider unless they can find some innovative way to reduce it.
That narrative is false. Here’s why.
Remember the sample configuration we used to compare IaaS pricing in my earlier post? The internal IT costs that would be replaced by public cloud service for that configuration break down as follows:
Notice that almost half of the cost is the infrastructure hardware itself – servers and storage. This is using large internal IT shops as a model, and since the leading public cloud providers tend to use their own infrastructure software, cheap hydroelectric power and labor that gets more efficient with scale, it’s reasonable to assume that the hardware is actually more than half of what you are getting with public cloud.
Now remember that unit costs for IT hardware tend to go down significantly over time. By “unit” costs I mean the cost for each unit of capacity, such as a GB of disk or an EC2 Compute Unit of processor capacity. Moore’s Law has decelerated a bit since its inception, but we’re still seeing a doubling of processor power around every two and a half years at an equivalent price point. Disk storage gets cheaper for equivalent capacity even faster than that. The end result is that the 100 AWS compute instances and 1000 TB of disk you signed up for a couple years back now cost Amazon significantly less than they did back when you started using them. And that means that they have a choice: they can either let their service get more and more profitable, or they can drop their prices.
The providers generally don’t like to talk about things like Moore’s Law, for the same reason that IBM Global Services didn’t like to talk about it back in the early days of traditional IT outsourcing. Too many customers were looking at cost savings on day one of the deal only and didn’t understand that they’d be overpaying a few years later if they were still being charged the same amount. It wasn’t until the advent of benchmarking clauses in outsourcing contracts that this began to change. Nobody is benchmarking cloud deals since barriers to switching providers are so low, but those barriers are pretty meaningless if the dominant providers price services in lock step with one another.
The one exception among the cloud leaders right now is Google, sort of, since they do mention Moore’s Law on their site as being a part of their pricing philosophy, but it seems to get little attention, and I’m not convinced they are still walking that talk. If they are then they’ll drop prices very soon. In any case, all of the providers are very, very aware of this situation, and they track it very closely. Many of the customers, not so much. They are historically awful at preparing business cases, and the pundits that make a living writing about what the providers do tend to be focused on easy things like the next great provider innovation rather than hard things like cloud service financials. It’s now been over a year and a half since I priced the configurations we’ve been discussing in this blog, and the pricing from Amazon, Google and Microsoft still hasn’t changed in all that time. But business is good, and the money is just rolling in, so why would they change a thing?
I’ve seen this question being asked more often lately. I’ve even seen a few pundits, peddlers and consultants referring to public cloud services as if they are commodities, as if that’s something that everyone knows and simply accepts, so perhaps it’s time to give the question a close examination. There are many definitions of the word “commodity,” but in the context that I’m writing about here a commodity tends to have the following attributes:
- It’s usually an economic good that has some value.
- One unit of that good has about the same value regardless of who sells it to you.
- It’s usually mass produced and
- Markets for commodities are usually driven primarily by price, with brand names being of far less value than in markets for non-commodity goods.
- Because of these factors commodities can usually be easily bought and sold on market exchanges.
Common examples of commodity goods are things like gold, copper, crude oil, soy beans and wheat. An ounce of gold, a barrel of oil or a bushel of ordinary wheat have more or less universal values no matter where they come from. They are mass produced and have a single unit price that you can look up on a commodity exchange at any point in time.
Certainly, public cloud services are not very similar to this. Right off the bat we see that they are not goods; they are services. How can a service be a commodity, since the quality of a service almost undoubtedly varies from one provider to the next? Well, they can’t, but to be fair there are some services that are at least “commodity-like.” Two that come to mind are the notary services and vehicle inspections that we all need from time to time. Those aren’t traded on exchanges like true commodity goods would be, but they are at least similarly priced and something you can get from pretty much any provider and receive more or less the same value. So, for the sake of argument, let’s give cloud services a pass on our first criteria above and stretch the meaning of “commodity” to include services.
On to points 2 and 3. One could argue that public cloud services are sort of mass produced, since they are built on huge “farms” of equipment, large groups of which are identically configured, and the same menu of services is offered to every customer. Unfortunately, however, that’s only true if you focus on one provider. If you look across providers, you find that very different services are provided on top of equipment configurations that are very different from one another. The prices, and what’s included in the prices, vary dramatically from one provider to the next, as we’ve discussed in-depth in my previous posts. Support, SLAs, security, functionality, performance and other factors are all quite different from one provider to the next, and anyone who’s read The Cloud Service Evaluation Handbook appreciates the critical importance of these differences.
If that’s not enough to end any debate, it’s on point 4 that cloud services spectacularly fail the commodity test. I would argue that the market for public cloud is in no way driven by price. Some may be shocked to hear me say that, but a brief review of my posts on cloud pricing below shows that to be true. A market that was truly price-driven would have its major competitors much closer together on what they charge, just as you often see from two gas stations situated across the street from each other. Brand names are, in fact, almost everything in the cloud service market, at least so far. Amazon is seen as the inventor of this space, and they are clearly reaping the benefits of that. Microsoft has a large mind-share among developers and Google is perceived as a pioneer for almost any type of new, automated technology. This is why IBM isn’t crushing the rest of the market, since their brand is more associated with their leadership of the “old,” not-much-loved traditional outsourcing space. Add the relative strengths of these important brands to the actual features and functionality of the services they sell, which are all different from each other, and you can fully explain their relative market success.
Finally, to our last criteria, there is no commodity market exchange for cloud services. “Hold it!” you say. What about Amazon’s “spot instances?” Those are traded on an exchange! Well sure, but not the kind of exchange that the meaning of the word “commodity” alludes to, and I actually think this might be where a lot of misuse of the term is coming from. You’ve got to remember that Amazon’s “market” is only for AWS EC2 instances, traded only with Amazon customers on an exchange run by Amazon. Microsoft can’t sell one of their instances on that exchange. You could argue that the spot instances themselves are commodities because anyone can buy and sell them, but that’s only one type of instance from one provider. There is no reasonable way to generalize that and infer that all public cloud services are commodities. Nor will that happen in the future unless Amazon becomes the only public CSP and all AWS infrastructure becomes “spot” infrastructure. Today what you have is more analogous to an equity stock market like NASDAQ,… if it only sold multiple types of stock from a single company and if NASDAQ were owned by that company. NASDAQ doesn’t trade commodities, and there’s a reason for that – commodities and equities really are quite different things.
So, hopefully that put’s the question to rest. Amazon did not look at IT infrastructure and decide “we can turn this into a commodity.” If they did they would have made all of their automation open source and their APIs non-proprietary. What they did do was look at IT infrastructure and decide “we can sell this like shrink-wrapped software or like books, and there’s nobody in the world better at doing that than us; all we need is the right automation.” Software packages are not commodities; books are not commodities and neither is IaaS.