In case you missed the first two laws of cloud cost optimization, here they are:
- Know Your Application, and
- Do a Thorough Service Evaluation
But let’s say you’re following those already. You’ve got a good grasp of the application’s workload profile, its needs for disaster recovery, security, response time, elasticity, interoperability and more (see The Cloud Service Evaluation Handbook™), or at least you have people on the team that do, and all of those people contributed to the evaluation of the service that you finally chose. You’ve definitely gotten off to the right start, but costs can still spiral out of control even if they were optimal on day one. The key to making sure that doesn’t happen is taking a variety of steps that fall under the heading of good Demand Management. Oh sure, you might think you’ve got that covered if you’ve been doing IT Demand Management for your private, non-cloud assets. Let’s say you already require strong business cases for new acquisitions, you monitor and improve the accuracy of those business cases, you make sure that old equipment and software licenses get reused to meet new requirements or get properly decommissioned, making sure they don’t continue to generate unnecessary costs in the form of management fees, software maintenance, etc. If you’re doing those things, pat yourself on the back because many organizations weren’t doing them very well even before cloud came along.
Now for the bad news: there’s a lot more to good demand management with cloud, particularly public cloud. First of all, it’s more important now than it ever was. An unused server sitting around in your own data center may not have been a big deal if you didn’t need the space for anything else. In the cloud, however, you’re paying for that unused resource every month, just throwing money straight out the window. The “usage” that cloud providers measure when they create your invoice is determined by the number and type of resources that you, the customer, have allocated to yourself. It’s not what you are actually using to do useful work at the moment, and financial people are sometimes surprised by that. Since the providers make more money when you over-allocate or forget to turn something off, you can’t expect them to do much to help you. Even when they provide tools that let you see where your issues are, it’s still up to you to use them diligently to keep your charges down. Here are a few leading practices that you can use to optimize costs. If you aren’t already using them, you could easily be overpaying by 20% to 50% or even more.
- Reassess your service decisions continually. Application workload characteristics change, and the service you chose initially was greatly influenced by what you expected those characteristics to be. You may have selected on-demand instances thinking that your application could release them most of the time. Is that what you’re seeing? Does your application even have the ability to release resources automatically, or would you benefit from a simple tool like ParkMyCloud that lets you deactivate them on a fixed schedule? Or you may have selected dedicated or reserved instances thinking that workloads would be fairly steady around the clock, and that may be true for some of them, but now that the workload has grown, perhaps some of the resources aren’t needed during off-peak hours. Would going to an on-demand scheme for some of them save money? If usage is very light and it’s the right kind of application, perhaps a FaaS solution like AWS Lambda, Azure Functions or Google Cloud Functions would now be a better choice, or even a Google “preemptible” VM.
- Reassess your configuration choices continually. For the same reasons that your service choice may no longer be optimal, your configurations are even more likely to need re-optimization as time goes on. Memory requirements, in particular, can change quickly. Performance bottlenecks will force you to go to instances with more memory if you need to, but there can also be opportunities to go to smaller memory configurations to save money. Your developers can and should be looking for ways to make your applications more efficient, and money saved on your cloud bill is the measurable payout for that, not only from charges for VMs and storage but for data transfer as well.
- Look for “zombie” resources continually. Zombies are resources that aren’t being used because they’ve become “unattached” or simply aren’t needed for running your applications. Storage volumes not attached to a compute instance are common examples, and you may be paying for the entire capacity even though you’re storing nothing on it. For IaaS, IP addresses, databases, load balancers and even VMs can all be zombies. One of the great things about cloud is the ability to try out ideas and then abandon them without making a large investment in infrastructure first, but zombies are a natural consequence of that. Also, if you aren’t currently “tagging” your resources, start now. You’ll want more tags for reporting purposes, but an Owner tag is critical for determining which resources are really zombies before you deprovision them. User accounts can also become zombies when employees leave or change jobs or when trading partners change, so this is important for SaaS as well as IaaS. Inactive accounts represent security risks as well as unneeded costs. Use your tools to detect resources with low utilization and investigate them.
- Keep doing Hierarchical Storage Management and Information Lifecycle Management, continually. They are still relevant in the cloud, perhaps even more so when the equipment is off-site than when you owned it. Cloud automation makes using lots of storage very easy, and those monthly charges are just going to keep coming until you do something to turn them off. Old snapshots may be okay to delete when you have enough newer ones. Back in the days of mainframes we used to periodically look at the last access date on every file and archive anything that went too long without being touched. In the cloud, you have numerous options on where to put your data, and some are orders of magnitude less expensive than others. Migrating data to options with lower redundancy, lower IOPS, magnetic vs. solid state, “infrequent usage” tiers, long-retrieval-time services (e.g. AWS Glacier), and, of course, deletion are all potential ways to save money.
- Look at opportunities to refresh your infrastructure, You may be thinking “Wait a minute! I thought cloud got rid of the need for us to do that!” Well, sort of. You may not have to rack and stack new boxes anymore, but you also don’t want to keep paying for your provider’s fully depreciated old equipment unless it has some unique feature that you still need. There’s a common misconception out there that the rapid price drops in the public cloud market are automatically passed on to customers, and, as we’ve discussed previously, that often isn’t true. First of all, they aren’t coming as quickly as they used to, and when they do they may only be implemented on new resource types, so if you don’t refresh the resources you’re using with the newer technology, you get zero benefit. How price changes take effect is entirely up to the provider, however, so it’s up to you to analyze pricing on every new configuration they release to see how it’s relevant to you.
Now, the good news is that there are tools to help with all this. Cost optimization tools are often incorporated in Cloud Management Platforms and help with all the things I’ve mentioned above. If you have a large cloud investment I’d expect the business case for one of them to be compelling (e.g., CloudHealth, Cloud Cruiser, Cloudyn, Cloudability, Cloudamize, Rightscale and more). CloudHealth, in particular, has been good about sharing their wisdom on this topic.
So, what’s the Third Law of Cloud Cost Optimization?
“Clean Up Your Mess, …Continually!”