Send in the consultants

I’ve recently moved into a new job at Novell, working on our strategy for worldwide services and planning for our next fiscal year is keeping me busy.  But I still, fortunately, deal with real clients and real problems too.  This one is classic: the client has several hundred old Unix and RHEL servers that they want to move to SLES.  Great!  We want to help.  So they negotiate the server deal and then want to know the cost to migrate.  How much is it going to cost, in total, to go from what they have today to what they want tomorrow?  They ask for estimates on a per-server basis; how many hours would it take to migrate a Solaris server to SLES?  Ten hours?  A thousand hours?  So they bring in the consultants, the dreaded consultants.  They’ve tried to avoid slowing down the deal but there’s no avoiding it now.

Well, you’ve done this before, they say, you’re grizzled veterans of the data center; is it two or ten hours for a server?  And the consultant — and I’ve been in this situation, believe me, it sucks — has to say, “Well, it depends.  It could be a thousand hours.”  Which is what everyone is expecting him to say because you can’t get a straight answer out of a consultant.  They’re always going to tell you “it depends.”  Right.

And even if there is all the time in the world, this particular answer needs to be in writing on the buyer’s desk by EOD today or the sales guy isn’t going to make his number for the quarter which means that he’s not going to make ‘club’ (his incentive travel event), which his wife is really looking forward to, so this damned consultant is not only not answering a simple, reasonable question from the beloved customer but they are also very directly making his wife mad at him, with the attendant consequences.

Perhaps you think I joke?  Or exaggerate?

Making matters worse, some nerd named Chad has downloaded OpenSUSE onto a machine in their testing lab and moved a couple of apps without incident (some directory changes, a few lines of code) and based on that experience has estimated that moving the three hundred servers will take approximately an hour each.  Seriously: we have clients who want us to tell them that moving unknown production workloads from one operating system to another will take less than two hours per server.

So the consultant sighs and starts to ask questions: What do the workloads on these servers actually do?  Online banking is different from warehouse management.  What platforms are they running?  (What version of J2EE?  What version of RHEL?  What version of Manugistics?)  Are they going to change anything else besides the operating system when they do this move?  Is the software custom or off-the-shelf?  What’s it written in?  If they say something like current Java apps running on a 2.6 kernel going to the same JVM on another distribution, that would be one thing.  If you are looking at non-ANSI C custom code on RHEL 3 on a complex multi-tiered app, that’s something else.  (Moving from the 2.4 kernel to the 2.6 kernel on any distribution is much harder than moving from one current distribution to another.)  What about storage, and backup, and disaster recovery?  Systems management?  There are a thousand more architectural details that you need to understand (one data center or many?  resource utilization?) but everyone is getting impatient with you and your endless questions.

Then you start getting into the enterprise-y aspects, which is where the real time and cost come in.  There’s a difference between Chad moving an app from one platform to another as a technical exercise and the actual time that it takes production applications to go from one to another.  What’s the testing regime?  I would expect that production code moving from one distribution to another would require real testing (stress/performance, UAT, etc.).  Would you include that in the estimate?  What about security?  Does the new OS have to go through a security audit at the company?  (Answer: yes, and it’s going to take a long time for the online banking app, believe me.)  Documentation?

This is all super-boring and bureaucratic and definitely not technical so the nerds aren’t interested and think it’s worthless and the sales guy is hearing his wife screaming at him and the buyer is saying, “Why is this so complicated?”

So, should we skip the backup part?

Really, the way to do this kind of thing is to do a quick assessment and figure out some kind of prioritization and rough sequencing, but that would require the client to spend time and money helping you to figure out how much to charge them and they are naturally leery of such a thing.  You desperately want to avoid getting locked into a fixed figure because you still have no real idea how complex the problem your being asked to solve is, but that is what the client and the others are asking for.

So you end up with a fudge; you commit to moving some edge servers and a cluster of supposedly simple apps and you sign up to do a security-approved core build and an assessment for the rest so that the project can get started and the customer can show progress to their boss and the sales guy can make his number.

Now you’re faced with months in the lab at the client site with Chad explaining to you how completely screwed up their environment is and how there’s no way that he’s going to give up his Solaris servers and anyway they’ve tried to do this themselves a bunch of times already and it never works because it’s not really a current release of Manugistics and they did some customization that they probably shouldn’t have…

Head in the clouds

Michael Nygard has his head in the computing clouds, suggesting that not only is cloud computing in our future, but that there’ll be many of them. He’s right.

Everyone who runs a large data center is today faced with the same set of interconnected environmental problems; space, power, and heating/cooling. And these are environmental not just in the sense of tree-hugging but also in a straightforward practical sense: there is no more space, there is no more power, there is too much heat and not enough cooling. These problems were the domain of junior people a few years ago, worrying about where, physically, to locate all the new Windows boxes. Then it was middle managers trying to sort out power and HVAC issues: “If we deploy a new phone system in our building we won’t have enough power to do any upgrades in the data center,” that sort of thing. Now environmental issues are front-and-center for senior IT management and if you’re a “red-shift” kind of company, for senior corporate leadership too.

You can cloak it if you want to in green terms but businesses are faced with real operational issues that they need to address regardless of their perspective on global warming or riverine dolphins.

Alongside these environmental issues, data centers are also facing a crisis of manageability. A large enterprise data center is a staggeringly complex thing, too complicated. Also, if the truth be told, most of them are not that well run; would you expect, for example, that an auto parts distributor would have great technology management skills? No, of course not, and the fact is that they probably wouldn’t want to spend the money to acquire that talent and technology even in they could; their differentiation, the competitive advantage of their business, lies elsewhere. So they have a complicated, and sub-optimized, technology infrastructure.

The answer to all of these problems — Monday edition — supposedly lies in virtualization. Novell gets brought into these conversations because inevitably data center managers have a roadmap that looks something like this:

Continue reading

Conservative IT

Enterprise IT, and the people that run it, are risk-averse. Things that work are valued, highly, over new things. The kids might all be learning Ruby and Scheme, but COBOL and C/C++ still rule in the enterprise, where Java is seen as an up-and-comer. Think mainframes are old news? Then you haven’t spent a lot of time in an enterprise data center.

I spoke recently with a guy worked at a VMS help desk twenty five years ago; he said that he’d recently run into some old colleagues from that time and asked them what they were doing. They said they were doing the same thing, VMS support, and that the team had pretty much stayed the same size, a couple of dozen people. IBM supposedly still has their own VMS help desk for their internal users. (You will recall that VMS is an old DEC operating system, an ancient enemy of IBM’s, so this is an admission not only that they use a competitor’s operating system but also, more to the point, that they can’t get off of it.)

Continue reading

The Fossa Project

penguins At Brainshare, Novell’s annual user conference in Salt Lake City, our CTO, Jeff Jaffe, announced a new technology vision, code-named “Project Fossa,” [pdf] intended to enable computing and collaborating with agility. The fossa is a cat-like mammal from Madagascar, sort of related to raccoons, weasels, and palm civits. (Fossas may be viverrids like civits or the falanouc, another Madagascar endemic; the taxonomy seems to be contested.) Fossas are supposed to be very agile, and if you have little kids you know them as the villains in the animated movie Madagascar. The project’s name is also a play on Free and Open Source Software (FOSS).

Here‘s some press coverage including the priceless hed “Novell focuses future strategy around endangered mongoose” from the UK edition of ZDNet.

Consulting to the rescue: Red Hat edition

Red Hat just announced that they’re acquiring Amentra, a 140-person systems integrator as a part of their “Enterprise Acceleration” inititive.  Word on the street is that they paid $40m, or about $300,000 per employee, or two times 2007 earnings.  I imagine that they’re hoping that this will help to right the JBoss integration, which has been a mess, according to what I’ve heard.