DevOps Blocker

Created Jan 29, 2011 · Modified Mar 1, 2012

I’ve been spending a lot of time lately thinking about this whole DevOps thing. Briefly, DevOps is the pursuit of merging the disparate cultures of Development and Operations groups in the computer software (particularly web-based) word. DevOps is the big thing in computer system administration these days.

For many good reasons, Development and Operations groups have historically operated relatively independently. The biggest reason for this is that Operations is concerned primarily with stability and Development with creating new features. This leads to a natural conflict, particularly since Operations gets blamed for outages, and outages are associated with new features.

The result of this is the throw it over the wall mentality - Development creates a new thing, and tosses it to Operations to deploy and manage. Operations resists because of perceived threats to stability and the risk of the dreaded site downtime. Tempers flare and resentment grows.

DevOps recognizes that this is not an optimum way to get things done and tries to tear down the wall between Ops and Dev. I think that’s a noble and necessary goal. The DevOps movement also acknowledges that many of the techniques and tools of the software development world can be applied in Operations. I’m particularly interested in the idea of sysadmins of using development tools like formal codereview. Software developers have created many excellent tools and methodologies, why shouldn’t we all use them?

This leads directly to the idea of defining our Operations environment by applying programmatic rules or as it’s more generally known today, configuration management. In the unix/open source world I inhabit, this revolves around three tools: Cfengine, Puppet and Chef. I think this is the purest expression of the idea of leveraging Development in Operations - define your environment with formal rules, and apply that configuration consistently. This is analogous to defining a specification and writing code to turn that specification into reality.

Thus, I find myself in agreement with many of the ideas and philosophies of DevOps. To be clear, I do not agree with all of DevOps. In particular, it smells too much like a ‘movement’, with all the negative connotations of intolerance and inflexibility that can lead to. However, I want to set that aside for the purpose of this post. Whether or not I agree with the label DevOps, I agree with the basic ideas (to be fair though, I do have a Bachelors in Computer Science so I’m probably biased towards the development angle). Here’s the question that I’m stuck on though: how do you implement DevOps in a large, established organization? It it even possible?

Some background: I currently work in one of those large, established organizations. My company owns some of the most prominent sites on the web. Just about everyone on the internet interacts with us in one way or another every day. My department is the Operations team responsible for one of the biggest internet properties at the company, and it’s a property that has been around for most of the life of the world wide web. This means we’ve got a lot of people working on a lot of mature systems.

The passage of time brings both maturity and complacency. Our team does a great job of managing the tools and procedures we currently have. In particular, we keep servers all over the world running with amazingly minimal downtime. We know how to analyze our failures and remediate them. We track metrics; we strive to reduce outages. Site Up is our mantra above all else.

The downside of all this is that gigantic systems take an extremely long time to change. Several years ago I was part of a team that worked on a major software conversion on 10,000 servers. That project consumed the lives of 6 people for a year. We got it done and I’m proud of the work we did, but a year is forever in internet time. That project dealt with just one part of our infrastructure. How then do you make all the day-to-day changes you need and move forward on these long-term projects and move towards something like a DevOps methodology?

Part of the issue here is that changing culture requires conscious effort from everyone in your organization. DevOps is a methodology and a movement. That means you have to win the hearts and minds of your peers. Large organizations actively resist change. This makes sense, since large organizations are safe and comfortable. Your job is generally well-defined - do what you boss tells you, keep in line with your peers, and you will do all right. I am in no way belittling the people in large operations organizations - I’m one of them myself, and I’m proud of the work we do. I just want to acknowledge the fact that the people in the group are somewhat self-selected. If you want excitement and risk, you go work in a startup. If you need a regular salary (particularly if you have a family to support) you migrate towards the larger company that can provide these guarantees.

From what I can tell, DevOps comes from the other direction. You might call it bottom up vs. top down. DevOps comes from the startup world (or at least that’s where I hear the most about it these days). Startups are all about wearing a lot of hats. Most importantly startups don’t start with system administrators. They start with developers (or salespeople, but let’s ignore that case). A few people come up with an idea and write some software. That software has to run on servers, so they start buying equipment (or space in the cloud) and configuring it. The good startups begin with configuration management immediately. Since the developers are running the show, everything is naturally developer-driven. Sysadmins and Ops folks tend to come in later, after much of the initial design of the infrastructure has been laid down. Compare this with existing large companies - the software developers who started things are either long gone, or the company has evolved so much that virtually none of their original designs remain.

This conflict, then, is my DevOps blocker. The culture at small companies comes from the software developers. Big companies are already established and have procedures for dealing with their world. Those procedures can involve some DevOps ideas and tools (such as code review and agile programming) but in general the large company world is not a DevOps world. Now maybe there are large companies out there which have fully integrated DevOps. I don’t think there are, but maybe I’m wrong. What are those companies doing differently? I imagine it has to revolve around the people that work at those companies. All I can draw on is my experience, and I’m stuck - how do you put a large organization on the path to DevOps? Is that a worthwhile goal, or is DevOps only appropriate for small groups?

Update, Next Day: I was reminded by @wastedcarbon about the wonderful DevOps presentation given by Adam Jacob at Velocity 2010. Go watch that short talk for a great take on what DevOps means.


Comments

@martinbarry

Part of the problem at large companies is that you can no longer stay across everything that is going on. They develop procedures and bureaucracy to cope with the disconnect between different parts of the company, even different members of a large team. If you don't fight against it, eventually you spend more time coordinating and keeping in sync, and less time "doing stuff" .

I think what devops has to offer a large company is a level of automation and integration so that less time and energy is spent on the "working together" part of the job and more time is spent "getting things done". The hardest part is to turn the mothership onto this new course and get to a point where the costs of the transition start to be outweighed by the benefits.

I work for a medium size company who could benefit a lot from what devops has to offer. But I"m not even over the first hurdle of management buy-in and it'll be a while before I get that. We have more burning issues than fiddling with the fundamentals.

John Allspaw

Having some familiarity with your environment and organization, I recognize some of the challenges that you're talking about. You're right in the basic idea that changing culture is hard. I'll go further and say that changing culture *is* the nature of what people call DevOps.

Deployment, metrics, ownership, automation, any type technology...none of those things IMHO are the problems to solve. I am unconvinced that large organizations can't have the development and operations cooperation and collaboration that you're likely to want.

I don't think that there's a valuable answer to your question, in the abstract. It's going to take specifics.
Also: I used to work where you work, so I have opinions from afar on these topics.

@debuggist

With bigger companies, it involves getting more people to adapt to that mindset, and that takes longer to achieve. Also, you're trying to convince people to try something that not just works but works better. They're wondering why change since what they have already works.

You're never at 100% DevOps. Some days you are but other days it could be at 90% or 60%.

Keep on chopping...

Fred Woodbridge

For starters, I'm glad you declared your bias towards development. That's always heartening.
Also, be aware that it's not just large companies that resist change, an all-too-human response; by way of example, Nature itself actively resists change. I think large companies systematically resist change necessarily, ie. they are big and big things have to overcome inertia, which is why (I think) some large companies break up into business units in an attempt to keep flexibility.
I studied EE (Electrical Engineering) and I naturally operate with an "ops" mentality and I don't think it is not necessarily a bad thing to separate the two if only to attempt to keep a bit of that flexibility aforementioned. That's not a particularly strong reason, I admit. Another reason is efficiency--if you have two different silos, each somewhat critical of the other, your product is overall better whether that product be a piece of software or a process. Combining these world's different philosophies can also bring more discord than normal--imagine for a while the idea of combining the Executive and the Judiciary and you'll see how troublesome things can get. Humans aren't comfortable with self-criticism and the product of a DevOps may be worse than otherwise.
I believe tech companies should emulate UNIX which consists of a lot of small tools each designed to do one thing well stringed together in an assembly line, one tool's output the other's input. Anyway, my quick two cents' worth.

KrisBuytaert

I think it's a common misconception that devops grew out of a Startup Culture ... yes lots of USA based folks make it seem that way .. but if I look around at the devops minded people around me in europe it's all but startups.

It's telco, it's banking , it's a variety of organisations ... so I`m not sure if youre blocker is really a blocker ... or just a different view on the world.

@harniman

So I was recently working in a large organisation that had that kind of inertia. Releases often failed and there was developer-ops blame all over the place along with lengthy time to value situations for new features. To break the organisation out of the mould, the Director bought in to Agile in a big way and set up empowered teams to deliver the totality of a given product. The team were fully accountable for both feature delivery and product operations - and thus was made up of devs and admins working in a devops culture - a bit like the start up scenario mentioned above. What did this achieve? Well initially, the teams went extreme and said "we cannot trust anything from corporate IT - they will let us down" and set about buying their own development kit and production hosting. They build their own OS, comissioned their own CDN. The result was highly effective. The team delivered an amazing product in a very short timescale. However, questions were then asked as to why the company's own expensive data centre etc etc wasn't being used and how they were going to deliver effective support.

The next project scaled back the freedom a bit, and I took the approach of hey, what's broken with the corporate model? Actually the biggest headache was at the operating system level - relying on a locked down IT process to make changes to server builds did not support an agile team, that says "hey I need a change to my Tomcat java args and I need it today". What tends to happen is the ops guys haven't got time so they give root permissions on the dev servers so devs can make the change, and then devs hack it and forget to tell ops what they have done, and then hey presto, the change never gets propagated to production. This is where all hell breaks out and the lack of trust starts to build up - ie you devs have broken production!

So what was the fix? We formed a team of sys admins working very closely with the devs - ie in the same office space and working for the same delivery manager. We acted as the interface for all platform aspects - os, servers, virtual containers, networks, firewalls, load balancers etc. We owned the high level platform design. We interfaced downstream to the networks, hardware and storage guys - so we leveraged corporate standards and investment. We owned the OS build and the monitoring (the stuff that changed frequently). We automated and version controlled the hell out of it - so if dev and prod differed it was our fault! We monitored everything about the OS and the app - we didn't want any surprises - especially as we were on call as well. We attended the dev team standups and picked up on issues that would affect production operations - ie scalability, reliability etc. We ran a prioritised backlog that the dev teams influenced. We automated the deployments - the same process ran for every dev deploy as into production - we failed fast and early. The result was a very strong delivery capability, very minimal process and documentation overhead, and 75 production releases in 9 months with no roll backs. Oh, and did I mention that was a team of 4 people supporting 5 product lines.

So the upshot is it can be done in a large corporate, but to do so requires having smaller teams of devs and ops working together. Somewhere you need to draw the line on what the devops team are empowered to do themsleves and are fully accountable for (ie with SLAs etc), and what they need to hand down to more traditional IT teams with clear SLAs and operation processes such that economies of scale can still be obtained. We chose to use the standard networks, hardware, storage and DC facilities, but to take ownership of the OS upwards ourselves as this was where the greatest change happended. Clearly the more you take on, the more you have to worry about being a jack of all trades and handling 24x7 support etc.

Categories