Getting to Cloud DevOps: A Basic Checklist

July 31, 2019

This blog post covers basic DevOps practices. If you already feel confident with DevOps you may want to read Thriving with DevOps: An Advanced Checklist.

With cloud platforms such as Cloud Foundry more and more teams move from a development-only team to a DevOps-enabled team. The aim in doing so is in continuously improving the application development lifecycle by moving fast from idea to production. Though, with development and operations efforts that need to get done some teams have less operations experience and neglect some important operational aspects. This blog post comprises a basic checklist that teams can go along and decide for themselves whether they want to increase efforts for a specific topic. Personally, I believe that any real DevOps team needs to have the following aspects in place. In an upcoming blog post I will provide a checklist for advanced DevOps topics.

1.) Continuous Integration & Delivery (CI/CD)

Here you go with a bold statement:

Taking a commit to production must run through a fully automated pipeline

That means that any change to software or infrastructure should move through a maintained (and understood!) path of CI/CD pipelines. To make this work in practice there should be a short feedback cycle for developers, so they know as soon as possible whether the new change has any flaws. Failing unit tests need to produce a build failure in less than 5 minutes and the overall process to deploy to production should take no more than 60 minutes. By having pipelines that run fast, you actually use them! There is nothing worse than a slow pipeline, as everyone will start to bypass the pipeline for the important production hotfix.

Despite automated checks, you can still embrace a Git workflow that supports your team and even has code reviews. Once the review passes, there should however be at most a manual trigger to finally deploy the code.

Also, if you have a product to operate, go the extra mile and also automate the deployment itself. Even if you have a manual trigger somewhere, a deployment should be no more than one or two clicks from a human. Humans should not be required to hop on a shell or do things on their local machine.

Here is a checklist to assess your team's progress for continuous engineering practices:

2.) Logging

Applications logs are so useful when it comes to understanding what went wrong in any case. To have them ready when you need them, make sure to emit them in your application and collect them in your infrastructure. Modern cloud log aggregation systems usually collect metrics from standard out stream as recommended by the Twelve Factors. Writing logs to a file system will likely cause logs to be lost when you need them the most, so make sure to follow cloud best practices.

While having access to the logs is the first step, the real value is in analyzing the logs to extract particular information. To achieve this, it is important that all team members are upskilled such that they can actually use the log aggregation tool to filter and query log messages for specific questions.

Here is a checklist to assess your team's logging setup:

3.) Setup Monitoring

It is crucial for a team to observe application behavior in production. There are lots of monitoring solutions available, that perfectly integrate with whatever platform you are using. If you are unaware of what monitoring solutions exist, chances are you already have one in your corporation - just ask around and get it also for your team.

Basic metrics you should be monitoring include:

Once you have the metrics in a monitoring solution, the next step is to setup alerts that inform you of critical situations. Some monitoring systems allow you to define hard thresholds, which make sense for example at 80% disk utilization. For the number of sales in an online shop, it makes more sense to have a dynamic and adaptive alerting, that considers the usual trend over time and detects anomalies.

Here is a checklist:

4.) Documentation

Documentation is important, but often neglected and outdated. When it comes to documentation it's all about documenting the right things, and not just documenting anything. When writing documentation there are two questions with which you can challenge the usefulness of the documentation:

Here are some topics that should be part of your documentation:

5.) Culture

Culture in a DevOps team is just as or even more important than the technical challenges. The team should adopt an agile culture that strongly aligns with business objectives and visions, and strives to deliver business value. There is just no point in shipping little to no value at a high velocity!

In order to develop a healthy DevOps culture, see if your team needs to improve in any of the following:

Conclusion

DevOps is not just about having the right tools in place. It is way more important that every team member knows how the tools work and that everyone has a shared understanding of what DevOps means to the team. DevOps is a journey that can get started by implementing some of the topics mentioned in this blog post. Not every team needs to implement and consider every aspect, but please do discuss this topics in your team and find alignment among all team members about what to do with them.

Do you think an important topic is missing from the list? Then please do write a comment, so we can collect more topics that are valuable for DevOps teams.

About the author: Fabian Keller

Loves cloud technologies, high code quality and craftsmanship. Passionate about woodworking when not facing a screen.

Comments
Join us