WhoDoesDeployments

Note: You are viewing an old revision of this page. View the current version.

So Who Is In Charge of Deployments, Anyway?

Mark Imbriaco said something provocative about software deployments the other day:

Deployment and ops are orthogonal. Once people understand this, the world will be a much better place.

I honestly could not decide whether I agreed with that sentiment or not. After mulling it over for a few days I realized I would have to write a blog post to sort this all out.

Why Do I Care?

I'm the manager of the Release Management group at a very large internet company. Our mission is all about deployments. We help developers write well-formed packages. We help QA set up their test environment. We design and run tests on software releases ourselves to verify correctness. Finally, we push the code to production. We push releases to more than 10,000 servers every week and we never stop.

Thus, I think a lot about how software deployments should be done. Here I want to describe our current model a little bit, and then tell you about where we are going and what it means.

The Current Process

First, I should tell you that my release management group is functionally part of the operation organization (Service Engineering). We come from a system administration background, and we're a bunch of perl scripters.

Our existing process reflects that heritage. We use a push-based deployment mechanism (sorry @jtimberman that's a topic for another post). The release assembly process is complicated and time-consuming. When you're supporting stuff that's been running in production for ten years or more that's just the way things go. I don't want to leave the impression that we are sloppy or that our process doesn't work. We are a very dedicated group and we do a hell of a job minimizing outages and making things go smoothly. A lot of people say that software rollbacks are impossible. Baloney. We roll a major release back in production on the average of once every 3-6 months. That means we are reverting the software and settings on many thousands of hosts - trust me, it works. I'm immensely proud of the smart, talented, and dedicated people I work with.

The downside is that this process is slow. Right now we are on a 3 week development cycle for our major component (about 10,000 servers worldwide). The actual software push to all those servers is done over two of those three weeks. Basically we are continuously pushing code, but not in the way you mean when you say continuous deployment. To make this schedule work takes a dozen people mainly working on building the software, testing it, and assembling the releases (including folks from QA, development, and service engineering). That's a lot of man-hours. Also, there's not really any way we could make the process much faster. I think realistically we could get it down to a two week release cycle but that leaves very little room for error.

The Future

Like I said, our existing process works - it's just slow and resource-intensive. Everyone knows that is the case, and we're constantly trying to improve the system. On the development side there's been a very heavy push towards continuous integration. A Build Engineering group in development is responsible for CI and they have been able to convert large chunks of our software to that process. Now the code gets built at least once a day and run through various tests via Jenkins.

The obvious next step after continuous integration is continuous deployment and we are beginning to do that as well. We now have pipelines which take some of our software from source code through build and test and then on to deployment to test servers. That's not continuous deployment to production but it's a big step towards that goal. Instead of waiting for developers to build packages and Release Management to install the test release before QA tests it we have a much more fluid environment. One command in Jenkins gets us to the point where lots of automated tests have been run and the test servers are ready for the full QA treatment.

It's unlikely we will ever get much further than this with continuous deployment on the existing historical software components. The biggest issue is test coverage - there's just not enough of it and it's not complete enough. Unless you are starting from the ground up it's extremely hard to fully automate build and release.

Recognizing this limitation, we've chosen to focus most of our efforts on making sure that new components are fully integrated into a CI/CD pipeline. The theory is that the old components will eventually be retired so that attrition will take care of the problem of modernizing old components.

So Who Owns What?

My group owns the release and deployment process, and we're in Operations. Note however that we aren't really what you would think of as a traditional operations team. We are rarely involved with on the ground firefighting. Of course a lot of outages tend to get blamed on us because we're the ones who push the code to production, so we spend a lot of time on education and communication. We end up being a very horizontal team - our time is split pretty evenly between working with Ops and working with Devs. Ultimately I think we end up living in both worlds.

It's important to note that you need a lot of operational knowledge to do deployments well. That's why I think it makes sense for our team to be located in service engineering. When software deployments have problems, those problems are largely traditional ops issues such as unexpected memory exhaustion or network load. You have to be able to understand the whole system and the only way to do that is have a solid system administration background.

However, the future is clearly all about system administrators acting more and more like developers. We spend all our time thinking about continuous integration and deployment. That means we have to know how to plug our existing scripts in to Jenkins. We have to be able to do much more than just 'push buttons'. When I expand my team I'm not going to hire someone unless they have solid scripting experience and are interested in thinking like a developer.

Our biggest challenge is how we connect with the Build Engineering team that's over in the development organization. I think it's not unreasonable that in the future that team will be somehow merged with my Release Management team. We work together very closely already. An integrated pipeline all the way from check in to deployment probably means we need an integrated team as well. However, right now we're doing just fine with two separate teams - there's so much work to do on both sides on various projects that we all stay extremely busy.

Conclusion

I see large-scale software deployments as the perfect place to apply the principles of DevOps. As I said, my group is already a very horizontal team. Communication between ops and devs is absolutely critical and that's how I spend a lot of my time right now. The expectation is that we are in the initial planning meetings when developers are just starting to think about new projects. We are right there helping to make decisions about how software is designed. That makes a huge difference when it comes to overall operability and reducing production 'surprises'.

Ultimately, deployments and operations are very different things. To deploy software successfully at scale you've got to look at the big picture, and that means thinking about the abstract world like developers do. My Release Management team doesn't worry too much about one-off problems and firefighting, and I'm very happy about that. Of course we want to do everything we can operationally to reduce outages and maintain quality. We need to work very closely with Operations to do that.

So Mark, I guess at the end of the day I halfway agree with you. Deployments and Operations are two very distinct things. You need very different perspectives for each. However, deployments are about making production changes and I think that's a really important reason that my group needs to be aligned with Operations. We need to directly interact with the people keeping things running. We also need a very strong system administration background so that we can effectively communicate with the rest of the operations team.

I think the best way to deal with this is to embrace DevOps. My team worries about things like metrics and agility. We worry about communicating between all the different groups. That's the only effective way to do quality software deployments.

CategoryGeekStuff
CategoryBlog
CategoryDevops



Our Founder
ToolboxClick to hide/show