I've been spending a lot of time lately thinking about this whole DevOps thing. Briefly, DevOps is the pursuit of merging the disparate cultures of Development and Operations groups in the computer software (particularly web-based) word. DevOps is the big thing in computer system administration these days.
For many good reasons, Development and Operations groups have historically operated relatively independently. The biggest reason for this is that Operations is concerned primarily with stability and Development with creating new features. This leads to a natural conflict, particularly since Operations gets blamed for outages, and outages are associated with new features.
The result of this is the throw it over the wall mentality - Development creates a new thing, and tosses it to Operations to deploy and manage. Operations resists because of perceived threats to stability and the risk of the dreaded site downtime. Tempers flare and resentment grows.
DevOps recognizes that this is not an optimum way to get things done and tries to tear down the wall between Ops and Dev. I think that's a noble and necessary goal. The DevOps movement also acknowledges that many of the techniques and tools of the software development world can be applied in Operations. I'm particularly interested in the idea of sysadmins of using development tools like formal codereview. Software developers have created many excellent tools and methodologies, why shouldn't we all use them?
This leads directly to the idea of defining our Operations environment by applying programmatic rules or as it's more generally known today, configuration management. In the unix/open source world I inhabit, this revolves around three tools: Cfengine, Puppet and Chef. I think this is the purest expression of the idea of leveraging Development in Operations - define your environment with formal rules, and apply that configuration consistently. This is analogous to defining a specification and writing code to turn that specification into reality.
Thus, I find myself in agreement with many of the ideas and philosophies of DevOps. To be clear, I do not agree with all of DevOps. In particular, it smells too much like a 'movement', with all the negative connotations of intolerance and inflexibility that can lead to. However, I want to set that aside for the purpose of this post. Whether or not I agree with the label DevOps, I agree with the basic ideas (to be fair though, I do have a Bachelors in Computer Science so I'm probably biased towards the development angle). Here's the question that I'm stuck on though: how do you implement DevOps in a large, established organization? It it even possible?
Some background: I currently work in one of those large, established organizations. My company owns some of the most prominent sites on the web. Just about everyone on the internet interacts with us in one way or another every day. My department is the Operations team responsible for one of the biggest internet properties at the company, and it's a property that has been around for most of the life of the world wide web. This means we've got a lot of people working on a lot of mature systems.
The passage of time brings both maturity and complacency. Our team does a great job of managing the tools and procedures we currently have. In particular, we keep servers all over the world running with amazingly minimal downtime. We know how to analyze our failures and remediate them. We track metrics; we strive to reduce outages. Site Up is our mantra above all else.
The downside of all this is that gigantic systems take an extremely long time to change. Several years ago I was part of a team that worked on a major software conversion on 10,000 servers. That project consumed the lives of 6 people for a year. We got it done and I'm proud of the work we did, but a year is forever in internet time. That project dealt with just one part of our infrastructure. How then do you make all the day-to-day changes you need and move forward on these long-term projects and move towards something like a DevOps methodology?
Part of the issue here is that changing culture requires conscious effort from everyone in your organization. DevOps is a methodology and a movement. That means you have to win the hearts and minds of your peers. Large organizations actively resist change. This makes sense, since large organizations are safe and comfortable. Your job is generally well-defined - do what you boss tells you, keep in line with your peers, and you will do all right. I am in no way belittling the people in large operations organizations - I'm one of them myself, and I'm proud of the work we do. I just want to acknowledge the fact that the people in the group are somewhat self-selected. If you want excitement and risk, you go work in a startup. If you need a regular salary (particularly if you have a family to support) you migrate towards the larger company that can provide these guarantees.
This conflict, then, is my DevOps blocker. The culture at small companies comes from the software developers. Big companies are already established and have procedures for dealing with their world. Those procedures can involve some DevOps ideas and tools (such as code review and agile programming) but in general the large company world is not a DevOps world. Now maybe there are large companies out there which have fully integrated DevOps. I don't think there are, but maybe I'm wrong. What are those companies doing differently? I imagine it has to revolve around the people that work at those companies. All I can draw on is my experience, and I'm stuck - how do you put a large organization on the path to DevOps? Is that a worthwhile goal, or is DevOps only appropriate for small groups?