virtual thoughts: 2008

Tuesday, July 22, 2008

Coupling, the enemy of web applications

Recently I have encountered yet another example of a good website hampered by very bad architecture. The once standard policy of de-coupling the application layer from the data was completely ignored, and over time has resulted in a spiders web of dependencies that the operations group can barely manage. Although the site performs adequately, it counsumes sevral orders of magnitude more infrastructure than would normally be required for it's operation

I have seen it time after time. An initial concept site (in this case e-commerce) is well designed and deployed. But over time, additional applications are added by multiple application developers resulting in hundreds of inter-dependencies that makes maintenance a nightmare. Now add the complexity of moving it to a new data center, it's nearly impossible.

The most effective way to avoid this, is to isolate the knowledge of the location of the data from the applications and move it to a middle tier, or middleware. This enables the migration of applications and data to a new location in distinct pieces, and also makes it easier to identify the interrelationships between the various applications that act on the data. If each application has direct access to the data, then moving the application and data must occurr in one large move, very risky, expensive and likely not to succeed on the first try.

Therefore a 2nd environment must be built to shadow the first, perpetuating all the bad design, and complex relationships. The applications must be duplicated to the new location, and the data must be replicated. Depending on the volitility of the data, and the applications tolerance to downtime, this can range from backup and restore from tape, to synchronous replication. The coupling of the data has increased the cost of a move by at least an order of magnitude, and perhaps more. Not to mention the added complexity of maintaining a spiders web of applications and infrastructure.

Wednesday, July 2, 2008

SOA, SaaS, Cloud Computing – all point to utility computing architecture

Utility computing has been a goal of the IT industry for as long as I can remember. It is almost the elusive “holy grail”. But now it’s almost a necessity. To process it the cloud, you must deliver software as a service. To effectively deliver software as a service, you need a service oriented architecture. And a service oriented architecture must provide capacity on demand, or utility computing. Virtualization is the essential underpinning to this approach.

But there seems to be slow adoption of virtualization beyond server consolidation. I believe this is specifically due to the lack of mature systems management tools. VMWare is agressively acquiring and developing additional capabilities, but M$ is purporting to have intrinsic management advantages buit into it's Hyper-V. But Hyper-V lacks network migration for workload balancing; VM shadowing; or remote replication for disaster recovery. While Hyper-V is lacking some of the more sophisticated capabilities of VMWare, it certainly has the advantage of price, and with better management capability, it may compel many IT decision makers to choose it from a manageability standpoint. Regardless of the hypervisor of choice, it is only one component of the overall architecture required to reach commodity infrastructure. Storage and network virtualization and mature tools to manage all three components seamlessly will also be needed.

Of course, I speak as though the data center is nothing but x86 machines (I've been reading too much of VMWare glossware lately), and we all know that while it is a large presence, there is still a quite a bit of it's bigger brothers UNIX and mainframes. Each of the leading vendors in the UNIX arena have their own virtual strategies which are quite mature although bound to their respective operating systems. Before you will be able to manage the entire computing environment seamlessly as a cloud, there will need to be way to manage them all. I dont think it would be possible with the array of tools that would be required today to manage all these disparate environments. Unfortunately I believe it's up to the systems management vendors to develop more robust tools that can adapt to dynamic environments, manage the various vendors products, and do so in an efficient manner before cloud computing can become a reality. Until then, those seeking to deploy cloud computing, will have to rely on homogeneous infrastructure, very highly skilled systems engineers, and a collection of tools from many sources (possibly some home grown) to do so.

Tuesday, July 1, 2008

Virtualization beyond Hypervisors - enabling reliable DR, disrupting best practices

I apologize for the long post... This was a white paper / presentation and I have had several requests for copies so I thought I'd just put the text here. (If you are interested in the WP with the graphics, or the presentation, just send me an e-mail or post a comment) - VB

“Virtualization” has been THE buzz word for the last few years, and yet it seems to mean something different to everyone I discuss it with. The funny thing is, this isn’t new, it’s been around longer than I’ve been in IT (almost 30 years).

To understand the impact this “new” approach will have on capacity, change and configuration management we need to understand the impact it has had in the past. I’ll go through the significant evolutionary phases that got us to our current state, and then explain the impact to capacity, change and configuration management.

The Mainframe is King

As I was growing up in the ‘60s and ‘70s, mainframes were the only computers. They were incredibly expensive, consumed vast quantities of power, real estate, and were enigmatic behemoths which required the care and feeding of teams of scientists who spoke in terms that kept the end user at bay. End users most likely never touched or even saw the computer itself. It was kept behind cipher lock and key in what appeared to be a “clean room”, with special controls for heat and humidity. The users interacted with it through paper – punched cards and green and white lined, 132 characters per line printed reports.

By the time I joined the ranks of those tending to these great behemoths things were changing – users were interacting directly with the computer (or so it seemed). Time-sharing, dividing the computer’s use among many end users in very small time slices, was becoming the norm rather than the exception. Because computers in interactive use spend most of their time idly waiting for user input, many users could share a machine by using one user's idle time to service other users. Each user had his own “virtual” machine.

Time-sharing was invented in 1957 by Bob Bemer, and the first project to implement it was initiated later that year by John McCarthy. It first became commercially available through service bureaus in the 1960s mostly provided by IBM (sounds like SOA to me). These were prohibitively expensive and short lived in the market. The first commercially successful time-sharing system was the Dartmouth Time-Sharing System (DTSS), implemented at Dartmouth College in 1964. But it didn’t really begin to take hold as a standard in most businesses until the 1970s. There were many reasons for this, most were economic.

However innovative these time-sharing systems were, they were all monolithic. It was one big central computer, hard wired over proprietary physical connections to static end points (terminals, printers and the combination thereof). In this environment configurations rarely changed, not only because of the great expense, but also because change required an effort that would bankrupt a modern day company. And due to cost, capacity utilization was scrutinized under a microscope as resources were more expensive than gold.

As an operating systems programmer working for the government, I had the unique opportunity to be involved in a project that would forever change the landscape of computing – Arpanet. What was unique about Arpanet, was that it linked many different kinds of computers located in different locations through communications links. There had been many homogenous networks prior to this (all of one manufacturer and type of machine), but none that allowed heterogeneous access. In the beginning it was comprised of only Government agencies, government contractors, and universities doing government research. Since each manufacturer used a unique and proprietary operating system, a standard was required for communications between them which resulted in TCP/IP. This now allowed users to access resources provided by others, and to communicate in real-time via e-mail. Seemingly overnight, the most important technology related to computing became the Network.

The Network is King

UNIX workstations and PCs enabled “smart terminals” (or “Fat Clients”), which had the ability to process locally, not just interact with the mainframe as did the terminals. Once these workstations were connected together, they formed a local area network (LAN). LANs made available resources in a shared manner (virtual), as well as enabled collaboration among end users - productivity soared. While this was realized for the most part in the 1980’s, LANs didn’t reach their potential until a bit later.

LAN proliferation was hampered by incompatible physical and network layer implementations, and confusion over how best to share resources. Each vendor had its own network cards, cabling, protocol, and network operating system. There was SNA, Arcnet, DECnet, Ethernet, IPX, AppleTalk, Token ring, TCP/IP, NBF and others. TCP/IP and Ethernet have now almost completely replaced them all. The agent of change was Novell Netware which made networking PCs productive. Netware provided consistent operation for the 40 or so competing card/cable types, and a more stable operating system than most of its competitors. It dominated the PC LAN business from early 1983 until the mid 1990s when Microsoft finally entered the market with a (semi) stable product that was network capable. During this same era, Unix workstations from Sun, HP, SGI, Intergraph, and others were developed for high end requirements (engineering, the space program, etc) which used TCP/IP, and many believe this was the driving factor which led Microsoft to choose TCP/IP for it’s implementation.

Still, the primary technology focus was on the Network well into the early ‘90s, and many large development projects were run from that point of view. In hindsight this seems ill founded as the early LAN infrastructure and the PCs being used as workstations were unstable at best, and the only server with enough horsepower for large datasets was the mainframe (which couldn’t connect to most LANs). Most of the time, if something was broken, it was network related. Systems management now became a different science, as the tight controls available within the mainframe were not possible on widely distributed resources. LANs were loosely controlled and were propagating at an incredible pace, becoming more and more complex by the day. Capacity planning was practically synonymous with bandwidth utilization, and much education was devoted to the detailed understanding of bridging, routing, and switching.

This would all change abruptly with the advent of UNIX based Symmetric Multi-processors (SMP) and RISC. UNIX would now surpass the mainframe in raw compute capacity at a much lower cost, without the rigid constraints of the mainframe world.

The Server is King

With the onset of SMP UNIX servers, the mainframe was pre-maturely declared dead (due mostly to the hubris of the UNIX Systems Engineers who had displaced many OS 390s and assumed this would continue until all would be UNIX). Capacity, change and configuration management processes, largely developed in the tightly controlled environment of the mainframe, would have to adapt to a world prone to chaos instead of consistency. Capacity planning now became: if you run out of horsepower, get a bigger (or faster) server. Configuration management was limited to ensuring the components that were installed in a server would actually work in that particular model, and change management was an idea something akin to an urban legend.

The administrators of the “Wild West” of computing as it was called held the rigid discipline of the mainframe world in contempt. The very title of this technology, “open systems”, reflects the lack of controls. They scoffed at change control and frequently made changes to production systems on the fly (often as root). They were also quite familiar with catastrophic outcomes from minor changes. Something had to give, and this led to creation of process management standards generally accepted as the ISO9000 guidelines (now ITIL). Unfortunately, these processes were developed for ideal situations and very few companies could implement them without severely hampering the productivity of their IT operations. Consequently, they were followed loosely if at all in most organizations into the mid 90’s.

Ironically, it was not the process engineers from the big five pushing ISO9000 adherence, nor an awakening of systems administrators that caused the next major shift, it was the belief that the most valuable corporate asset was its data. This message resonated with the business owners and replaced the speed race for bigger and faster servers with a focus on storing, retrieving, and most importantly protecting data as reliably as possible. It produced a “speeds and feeds” race of its own, but most importantly, it restored the processes of capacity, change and configuration management.

The Storage is King

Storage technology could replicate data over long distances for recovery in the event of disaster. So companies now began to focus on application availability (from the user standpoint), and recovery from failures. This view greatly enhanced the perception of the “standard processes” and ITIL took over the ISO9000 arena.

UNIX servers had become so large and powerful they began to implement mainframe-like technology (partitions, resource management supervisors, etc.). You simply could no longer find applications that required an entire enterprise class server. Servers now had to be shared with multiple applications. Now, for the first time in a decade, the majority of administrators welcomed control over changes because a failure caused great business impact (and would often cost them their job). Capacity, change and configuration management processes returned and were adopted in nearly all large environments.

At the turn of the millennium we had a side note. This did not equate to a major change overall, it simply modified the open systems world found on the raised floor. Prior to this time, Windows was not allowed on the raised floor as it was far too unreliable. While still unreliable (but less so at this point), Windows 2000 servers began to find their way onto the raised floor of the data center. This was due to economics – they were simply so cheap and easy to deploy that many business opportunities could now be pursued at a very low cost. Thus, they began to multiply like the Tribbles from the original Star Trek series who seemed to endear themselves to every sentient race which encountered them (except Klingons and Romulans of course).

There were two fundamental problems caused by this great proliferation. The first was their shear numbers. They proliferated faster than their predecessor UNIX (which I should note grew at a relative pace even faster than windows compared to its predecessor the mainframe) and the physical management of them became challenging. Then as windows began to mature and stabilize, and the x86 servers they were running on became significantly more powerful, another problem arose – Windows could not use the power the servers now possessed. Most applications were using a small fraction of the available capacity of the server. But unlike its predecessors, it did not have the capability to manage multiple applications on a single system, or to partition the servers into multiple “virtual” servers as UNIX and the mainframe did. For all intents and purposes, it was a one to one relationship between an application and a server. VMWare addressed this by bringing partitions to x86. Initially, VMWare didn’t change the way capacity, change and configuration management were done. But it opened the thinking of the architects and engineers to the concept of virtual machines, and more importantly application mobility, and prepared the IT world for commoditization.

VMWare – the Mainframe to x86

VMWare originally brought partitioning to the x86 world. Although ground breaking in this arena, it was old hat on UNIX and the mainframe. But because you could get the tribbles under control, it began sweeping the datacenters by storm. But VMWare brought more… it brought the concept of application mobility through VMotion. Through this marvelous innovation, you could move partitions (VMs) from one node to another within an ESX cluster - while running. This flexibility brings with it many capabilities – load balancing and fault recovery to name a few, but most significantly, x86 infrastructure had become a commodity (even though many haven’t realized it yet). What VMWare brought to x86, my company and others are bringing to operations in general. Everything required to run an application will become part of the application configuration, and will be managed (regardless of its technology domain) as it relates to the application. This will eliminate the tie between specific infrastructure components supporting an application and the application itself, allowing an application to use any available resource as needed. This will also cause the technology domains (server, network and storage) to “homogenize”, increasing interoperability and simplifying management. As these technologies mature and become pervasive, a major shift of control will occur, out of the data center and into the business itself.

The User is King

Welcome to the near future (and soon present) reality of the business owner managing his own environment. The user will be able to add resources (with an associated cost) to his environment when he needs to increase performance. He will add storage when he begins to run low. He will move historic data to archive storage as it ages (to reduce cost), and he will make these allocations from his desk because all of the processing resources will be shared. They will be added automatically based on business rules and policies that he will set in advance, or they will be added ad hoc when the need arises. This sounds far fetched, but how many of us ran a LAN in the late ‘80s or early ‘90s – these environments were fragile, complicated, expensive and hard to keep up. Most companies had several different LANs in different areas of their business (which couldn’t talk to each other) and large, very highly skilled engineering teams to support them.

This is certainly not the case today. Networks are already shared across the user base (the internet is shared across the world). Enterprise class servers simultaneously run many large applications. In fact, they need to be partitioned since very few applications (if any at all) can individually use all their capacity. Storage has been shared since the advent of the large capacity storage arrays in the mid ‘90s, and has further been shareable through the use of SAN, NAS and SVCs. As the technology matures and the distinction between vendors blurs, the data center will become homogenous within each technology domain. It will then be possible (and desirable) to turn control of the assets an application uses to the application owner.

What about the 3 C’s?

Capacity, change, and configuration management have been the fundamental core processes that ITIL and Six Sigma have used for years to evaluate the health of an IT operation’s processes. As an Operations Management Consultant I spent several years doing just that for many fortune 100 companies. But these were, for the most part, cumbersome controls put on IT to prevent it from hurting itself and the business. Now - with data center virtualization - capacity planning can be done in a reactionary manner (as you reach capacity, simply add a node). Change management will be accomplished in a virtual world without impacting production which will be set up and torn down by the users themselves, virtually replaced by Q/A testing. As for configuration management, what configuration will you be managing? No longer will it be hardware components and their configuration as each domain will be homogeneous. Resources will simply be added to the universal network(s) as needed.

Capacity management will be automatic
Configuration management will manage the configuration of the application
Change management will all but disappear into Q/A

The good news in the end

These processes were developed as a reaction to an environment which was basically out of control. These did not enhance operations, rather they were impediments. As IT operations have evolved from the mainframe to the network, server and storage eras, we have eventually come back to where we started, managing the resources as a pool (as we did in the mainframe era, sharing resources as the application needs require).

The data center will be become one big logical mainframe with the engineers working on it from the inside!

Business owners used to be constrained by rigid technology resource configurations and requirements, forcing the business owners to adapt to the technology. Over time, as data center virtualization technology becomes mainstream, more and more flexibility will be offered to end users to accomplish business objectives in real time. Whether it’s VMware or DynaCenter or any of the other tools which control application mobility (resource reconfiguration), positive change is taking place. Constraints and complexity are moving toward dynamic and nimble configuration management. In the near future, data center configuration management is a flexible and nimble science. System Administrators’ hands are no longer tied by burdensome infrastructure requirements and businesses will be enabled to respond quickly to rapidly changing market forces.

Page 1 of 6

Thursday, June 26, 2008

the curse of machine sprawl

In yesterday's post, I briefly touched on machine sprawl, and the irony that VMWare had basically solved the hardware machine sprawl of x86 servers through consolidation, only to accelerate machine sprawl as a whole with VMs. While the reason is obvious, the consequences may actually be worse than the hardware sprawl. Prior to 2000, Windows was not allowed on the raised floor as it was far too unreliable. While still unreliable (but less so at that point), Windows 2000 servers began to find their way onto the raised floor of the data center. This was due to economics – they were simply so cheap and easy to deploy that many business opportunities could now be pursued at a very low cost. Thus, they began to multiply like the Tribbles from the original Star Trek series who seemed to endear themselves to every sentient race which encountered them (except Klingons and Romulans of course).

The fact that VMs are now multiplying even faster than the windows servers a few years back is becoming a serious threat to the overall stability of IT. Part of my rationale is based on the inability of current systems management software to deal with dynamic environments, and the other is based on the now pressing need to re-invent the best practices defined by ITIL to accommodate virtual environments. Because VMs are viewed as cheap and easy to deploy, businesses are deploying them at an accelerating rate. Although VMWare is fairly expensive at over $1000/socket, there are new inexpensive (even free) virtual server environments on the horizon that may eliminate that cost, coming from M$ and others (see "ProxMox: The high-performance virtualization server for the rest of us"). This will only add fuel to the fire, overwhelming our already overworked and understaffed IT support staffs resulting in less stable environments.

Wednesday, June 25, 2008

When will virtualization be the norm?

Virtualization 2.0?

The driving force for all major waves of change in IT, and perhaps all major industries, has been solving a “pain point”. LANs emerged as a solution to get prossessing capability to all the users in the organization, not just us propeller heads writing and running the software in the raised floor sanctuary. The PC extended that to the home user and facilitated the distributed, networked computing era of the 1990s. Storage arrays solved the need to handle the exploding data volumes ushered in with the web. Advanced graphics processors and faster CPUs enabled the GUIs that we all interface with computers today. Disk based backups, sophisticated SAN clones, Snap copies etc. all helped solve the need to keep systems online, and give other systems such as backup and decision support access to the business data without interfering with the transaction systems running the business.

So what is driving the Virtualization 2.0 movement? As Dan Kusnetzky points out in his blog:

With the proper planning and correct implementation, the use of virtualization technology can bring the following benefits:

• Higher application availability than can be found on a single industry standard system.
• Scalability beyond what can be found using a single industry standard system
• Higher application performance
• Optimization of current environment
• Application agility and mobility
• Streamline application development and delivery
• No need to over provision to obtain reasonable service levels
• “Green computing” (lower power consumption, lower heat generation, smaller datacenter footprint)

I would add to that list:

• Disaster recover that is true recovery capability vs. a plan
• Efficient asset management (retiring servers by moving the server “image” to a new system with minimal interruption)
• Consolidation of distributed data centers by moving the images, not the hardware

But, as a strategist from one of the most capable virtualization companies, I have to echo his statement:

Will this be enough?

While to the casual observer, this seems like a lot of solutions to many different problems, and should be a “no brainer” as far as adoption, we have seen quite a bit of resistance to change and adopt this “new technology”. I use the quotes, because IMHO virtualization has been around as long as I have been in IT… nearly 30 years (see my white paper ‘Virtualization beyond Hypervisors - enabling reliable DR, disrupting best practices’). Many of the reasons are related to the need to separate “processing” from “storage”. Again I use quotes to emphasize something that should be 2nd nature, not something that seems foreign.

Some of what seems to be resistance to change may be some perceived and some real risks to wide spread use of virtualization. Some of these risks we have faced before.

• Increased impact of a system failure – In the late 1990s, enterprise class UNIX machines became so large that they had to be partitioned to effectively use their capacity. I forbid our company from using them in production due to the risk that a system failure would take down multiple applications simultaneously. The same is true today with virtualized servers running multiple VMS.

• Troubleshooting complexity - In the open systems world, troubleshooting performance problems is known as “pushing the bottleneck around”. That is, if you are constrained by CPU, add CPUs. Now you may be constrained by RAM, add RAM. Now it is I/O, add I/O capacity. Now it is CPU… In the virtual world, it is far more difficult to pinpoint what (or who) is causing the problem.

• Lack of mature systems management tools – It took the SM vendors decades to perfect (if you can make that claim) their tools in use today. But unfortunately, their capabilities are based on static environments, and most often do not react well to dynamic changes in infrastructure operations and configuration.

• Business process change – ITIL best practices are based on keeping things the same, or managing change at a very granular level to prevent the introduction of problems during changes. Change, configuration and capacity management processes (among others) will have to be revamped to accommodate environments that dynamically change.

• SLA impact – ensuring that specific machines provide guaranteed response times can become very difficult in an environment where workloads can move between machines while running and all underlying hardware is shared. While it might not mean an outage it could certainly adversely affect performance in a noticeable way.

• Machine Sprawl – the very problem that VMWare supposedly solved is returning in the virtual world worse than before. x86 machines were proliferating so rapidly because they were cheap and easy to deploy. Now we have fewer physical boxes, but the rate of VM proliferation is greater than the physical machines were before, because they are even cheaper and easier to deploy. System administrators may soon find themselves with more servers (virtual) than they have the ability to manage.

While I don’t think any of these are insurmountable, it may explain why the adoption of more widespread virtualization approaches is meeting resistance.

Network storage is key to virtualization

Virtualizing? Going Green? Then why do your servers have disks in them?

Virtualization is driving a need for shared storage. Whether you are deploying VMWare virtual machines, or virtualized physical machine images with Racemi’s DynaCenter, or both, virtualization relies on networked storage. Networked storage is the foundation of the mobility and recovery capability inherent in virtualization. If your virtualization goal is consolidation, improved reliability, more availability, automated disaster recovery, dynamic resource allocation or utility computing, without networked storage you can’t take full advantage of the virtualization you deployed.

Networked storage is rapidly gaining market share - iSCSI storage revenue was up nearly 75% in the 4th quarter of 2007 over 2006 (IDC). But most servers are still shipping with internal storage, primarily used to hold boot volumes and applications. Networked storage has advantages over internal storage on power consumption, cost per gig, and many other operational advantages. But the largest (and most overlooked) advantage that networked storage offers over internal storage is the elimination of stranded storage.

Stranded storage was one of the key arguments for storage area network (SAN) adoption in the late ‘90s, and I am surprised the storage vendors aren’t exploiting this today. Prior to SANs, large storage arrays would frequently have large percentages of their storage stranded because the available fiber channel ports were in use, but the systems connected to the array did not require the entire capacity of the array. With external storage being quite expensive at the time, stranded disk was a compelling reason to adopt SAN technology, which was also quite expensive at that time.

Today, due largely to the ever increasing capacity of disk, most servers are using a very small percentage of their internal disk. A customer I was consulting for on a data center consolidation project had completed their physical inventory of infrastructure (servers, network, storage, etc.). They documented 44TB of storage in use, and 21TB available for growth (10TB was located at the new data center and was not in use yet). But they had not inventoried their internal storage. In an exercise to determine what systems might be consolidated, I had them compile the same information for internal storage. They were amazed to find (I wasn’t) that they were using a small fraction of their internal storage (~7%).

But what surprised them more was that the total capacity of their internal storage (177TB) was greater than their external storage. When the original purchase cost of the internal storage was totaled, it was nearly double what they had spent on the SAN and storage arrays. Even if you eliminate the large database servers from the equation, total used capacity was only 7.8%. Ironically, this case is not the exception rather the norm.

The average power consumption of an imbedded 2.5” / 10,000 RPM drive is 11.2 Watts (IBM). So the additional, unnecessary, power consumption by a server with two drives barely used seems small at roughly 22 watts. But as an example, a Dell 860 server consumes 110 watts. If you boot from network storage and eliminate the 2 internal disks, power consumption is reduced by 22 watts or 20%. And that does not factor the additional power you save by not having to cool the additional heat generated by the drives or the cost of the drives themselves. Multiply that by all your servers and it is significant.

Now, if you add to the cost the additional power consumed and the heat generated by the thousands of drives, and the fact that all the systems were already attached to the SAN (no additional cost for FC HBAs), you have a hard time justifying storing anything directly on a server. Additionally, the mean time between failure increases dramatically when servers are configured without internal drives. And in most cases, companies who have a SAN deployed keep enough additional storage headroom to accommodate the boot and application volumes without purchasing additional disk. I have yet to encounter someone who doesn’t. This is primarily due to the relatively small disk requirements of system and application volumes compared to data and databases (with the possible exception of ERP systems like SAP).

Most data center operations managers are completely unaware of the power consumption costs of the environment they manage. The simple fact that “facilities” which is normally responsible for power management has little or no insight or influence on the decisions made on the data center floor consuming it. As the cost of powering the data center becomes a more significant percent of the expense of running the data center, this will have to change. More corporations will make datacenter operations managers accountable for the power their environments consume. Then there will be an incentive to eliminate stranded storage.

Last year U.S. data centers consumed more than 60 billion kilowatt-hours of electricity at a cost of about $4.5 billion, according to the Environmental Protection Agency (EPA). A good chunk of this power—up to 60% in some cases—is needed to cool servers. Data centers accounted for almost 2% of this country’s total energy consumption. These numbers have risen quickly, nearly 40% between 1999 and 2005, according to a survey by the Uptime Institute. And they may double in the next five years to more than 100 billion kilowatt-hours, according to the EPA.

If the increasing cost of the energy doesn’t scare you, the availability might. According to AFCOM, an association of data center professionals: Over the next five years, power failures and limits on power availability will halt data center operations at more than 90% of all companies. Gartner predicts 50% of IT managers will not have enough power to run their data centers by the end of 2008. Expect a rise in outages, along with a pressing need to add more space and power to meet computing demands.

So at this point, you have to ask… Why does anyone purchase servers with internal storage? With the proliferation of very high quality, low cost, network storage technology it amazes me that any data center would not be booting from networked storage (SAN, NAS, or iSCSI). There are so many inherent benefits from using only network storage with diskless servers, you would expect 100% adoption in at least the fortune 5000. Moving all of one’s storage to the network simply makes too much sense not to do it. So why aren’t more companies buying diskless servers and booting from SAN?

Perhaps the reluctance to move to diskless servers and networked storage lies in the motivation of the companies producing the technology. The server vendors are not motivated to push the benefits of network storage and diskless servers. They would perceive that it would lower their revenue (although only slightly). The storage vendors don’t see additional revenue unless they are selling to a customer which doesn’t currently have networked storage. Most people who have networked storage in place already have enough free storage to move their internal volumes to the SAN without buying additional disks. And if a customer (or potential customer) does not have a SAN, adding the additional perceived complexity of booting over the network is most likely perceived as a sales impediment.

Working as an operations management consultant for several years, there were many occasions where the best approach my client could take to reduce operational and capital expenses was to move to networked storage and diskless servers. But I frequently encountered heavy resistance to do so. The following list contains some of the most common (unreasonable) reasons I was given. Bear in mind that most of my clients have been fairly large companies who had some or all of their servers attached to a SAN, yet they were still against diskless servers. These objections to booting from SAN are ranked in order of frequency to the best of my recollection;

1. We don’t boot from SAN (and when asked why, they simply repeat it
2. SAN storage is too expensive - (I actually had one company tell me this even after it was demonstrated that the true cost of their internal storage – even without factoring power and heat costs – was 8 times more expensive per GB because they were using such a small % of it).
3. Security won’t permit it (as if one server could get to another via the storage)
4. Boot from SAN is too slow (actually much faster)
5. It’s too complicated (this one I hear most from companies that don’t refresh their technology regularly… i.e. we do it this way because we always have. See #1)
6. It would create I/O bottlenecks on the SAN when a server is booting (not)
7. It introduces additional risks (I could never figure out what they were, though)

This list is by no means intended to be comprehensive. If you have had the experience of trying to show the benefits of diskless servers to someone you may have heard many other excuses. But it really boils down to change and influence. It may also be that the server engineers and systems administrators perceive removing disks from servers as a loss of control. After all, today the server is the “center of the universe” as far as most data centers go. If the servers become simply processing regions, servers would most likely become commodities.

But the benefits of diskless servers and booting from the network are too great to ignore. Aside from the reduced operational cost of the data center, there are many operational advantages that make a strong argument for this case. You can find some of the benefits listed in any storage vendor’s material. I pulled this list from a Dell whitepaper:

Boot-from-SAN benefits include:

1. Improved disaster tolerance
2. Centralized administration
3. Reduced total cost of ownership (TCO) through diskless servers
4. High-availability storage
5. Enhanced business continuance
6. Rapid server repurposing
7. Consolidation of image management

Obviously if you have not deployed a SAN, the cost appears prohibitive, especially for fiber channel. But with the advent of lower cost SAN technology (<$25k), the maturation of iSCSI and FC over IP combined with very affordable network storage from companies like Agami, Compellant, EqualLogic (now Dell), Pillar and many others, that barrier is really one of perception. It can actually be less expensive to order servers without disks and use an affordable network storage device instead. Additional benefits of diskless servers are:

1. Less power consumed per server
2. Less heat generated
3. Higher disk utilization (eliminates stranded disk)
4. Increased server reliability
5. Reduced cost of the server

With the widespread adoption of virtualization technology, the benefits of network storage grow exponentially. Whether you are deploying VMWare virtual machines, or Virtualized physical machine images with Racemi’s DynaCenter, or both, networked storage becomes an integral part of the solution. These virtualization technologies combined with replication, clones, multiple mirrors and snaps enable new capability such as:

1. Guaranteed automated DR Capability (vs. a plan.)
2. Efficient data center consolidation
3. Centralized lab management
4. N-1 recovery (protecting hundreds of different servers with 1 standby vs. H/A)

So don’t wait for your server or storage vendor to give you a sales pitch on the benefits of networked storage. If you’re deploying virtualization technology, you’re missing capability and wasting money if you’re not using networked storage. With today’s affordable and reliable network storage technologies, going diskless and going green makes more sense than ever.

virtual thoughts