Tuesday, July 22, 2008

Coupling, the enemy of web applications

Recently I have encountered yet another example of a good website hampered by very bad architecture. The once standard policy of de-coupling the application layer from the data was completely ignored, and over time has resulted in a spiders web of dependencies that the operations group can barely manage. Although the site performs adequately, it counsumes sevral orders of magnitude more infrastructure than would normally be required for it's operation

I have seen it time after time. An initial concept site (in this case e-commerce) is well designed and deployed. But over time, additional applications are added by multiple application developers resulting in hundreds of inter-dependencies that makes maintenance a nightmare. Now add the complexity of moving it to a new data center, it's nearly impossible.

The most effective way to avoid this, is to isolate the knowledge of the location of the data from the applications and move it to a middle tier, or middleware. This enables the migration of applications and data to a new location in distinct pieces, and also makes it easier to identify the interrelationships between the various applications that act on the data. If each application has direct access to the data, then moving the application and data must occurr in one large move, very risky, expensive and likely not to succeed on the first try.

Therefore a 2nd environment must be built to shadow the first, perpetuating all the bad design, and complex relationships. The applications must be duplicated to the new location, and the data must be replicated. Depending on the volitility of the data, and the applications tolerance to downtime, this can range from backup and restore from tape, to synchronous replication. The coupling of the data has increased the cost of a move by at least an order of magnitude, and perhaps more. Not to mention the added complexity of maintaining a spiders web of applications and infrastructure.

Wednesday, July 2, 2008

SOA, SaaS, Cloud Computing – all point to utility computing architecture

Utility computing has been a goal of the IT industry for as long as I can remember. It is almost the elusive “holy grail”. But now it’s almost a necessity. To process it the cloud, you must deliver software as a service. To effectively deliver software as a service, you need a service oriented architecture. And a service oriented architecture must provide capacity on demand, or utility computing. Virtualization is the essential underpinning to this approach.

But there seems to be slow adoption of virtualization beyond server consolidation. I believe this is specifically due to the lack of mature systems management tools. VMWare is agressively acquiring and developing additional capabilities, but M$ is purporting to have intrinsic management advantages buit into it's Hyper-V. But Hyper-V lacks network migration for workload balancing; VM shadowing; or remote replication for disaster recovery. While Hyper-V is lacking some of the more sophisticated capabilities of VMWare, it certainly has the advantage of price, and with better management capability, it may compel many IT decision makers to choose it from a manageability standpoint. Regardless of the hypervisor of choice, it is only one component of the overall architecture required to reach commodity infrastructure. Storage and network virtualization and mature tools to manage all three components seamlessly will also be needed.

Of course, I speak as though the data center is nothing but x86 machines (I've been reading too much of VMWare glossware lately), and we all know that while it is a large presence, there is still a quite a bit of it's bigger brothers UNIX and mainframes. Each of the leading vendors in the UNIX arena have their own virtual strategies which are quite mature although bound to their respective operating systems. Before you will be able to manage the entire computing environment seamlessly as a cloud, there will need to be way to manage them all. I dont think it would be possible with the array of tools that would be required today to manage all these disparate environments. Unfortunately I believe it's up to the systems management vendors to develop more robust tools that can adapt to dynamic environments, manage the various vendors products, and do so in an efficient manner before cloud computing can become a reality. Until then, those seeking to deploy cloud computing, will have to rely on homogeneous infrastructure, very highly skilled systems engineers, and a collection of tools from many sources (possibly some home grown) to do so.

Tuesday, July 1, 2008

Virtualization beyond Hypervisors - enabling reliable DR, disrupting best practices

I apologize for the long post... This was a white paper / presentation and I have had several requests for copies so I thought I'd just put the text here. (If you are interested in the WP with the graphics, or the presentation, just send me an e-mail or post a comment) - VB

“Virtualization” has been THE buzz word for the last few years, and yet it seems to mean something different to everyone I discuss it with. The funny thing is, this isn’t new, it’s been around longer than I’ve been in IT (almost 30 years).


To understand the impact this “new” approach will have on capacity, change and configuration management we need to understand the impact it has had in the past. I’ll go through the significant evolutionary phases that got us to our current state, and then explain the impact to capacity, change and configuration management.

The Mainframe is King

As I was growing up in the ‘60s and ‘70s, mainframes were the only computers. They were incredibly expensive, consumed vast quantities of power, real estate, and were enigmatic behemoths which required the care and feeding of teams of scientists who spoke in terms that kept the end user at bay. End users most likely never touched or even saw the computer itself. It was kept behind cipher lock and key in what appeared to be a “clean room”, with special controls for heat and humidity. The users interacted with it through paper – punched cards and green and white lined, 132 characters per line printed reports.

By the time I joined the ranks of those tending to these great behemoths things were changing – users were interacting directly with the computer (or so it seemed). Time-sharing, dividing the computer’s use among many end users in very small time slices, was becoming the norm rather than the exception. Because computers in interactive use spend most of their time idly waiting for user input, many users could share a machine by using one user's idle time to service other users. Each user had his own “virtual” machine.

Time-sharing was invented in 1957 by Bob Bemer, and the first project to implement it was initiated later that year by John McCarthy. It first became commercially available through service bureaus in the 1960s mostly provided by IBM (sounds like SOA to me). These were prohibitively expensive and short lived in the market. The first commercially successful time-sharing system was the Dartmouth Time-Sharing System (DTSS), implemented at Dartmouth College in 1964. But it didn’t really begin to take hold as a standard in most businesses until the 1970s. There were many reasons for this, most were economic.

However innovative these time-sharing systems were, they were all monolithic. It was one big central computer, hard wired over proprietary physical connections to static end points (terminals, printers and the combination thereof). In this environment configurations rarely changed, not only because of the great expense, but also because change required an effort that would bankrupt a modern day company. And due to cost, capacity utilization was scrutinized under a microscope as resources were more expensive than gold.

As an operating systems programmer working for the government, I had the unique opportunity to be involved in a project that would forever change the landscape of computing – Arpanet. What was unique about Arpanet, was that it linked many different kinds of computers located in different locations through communications links. There had been many homogenous networks prior to this (all of one manufacturer and type of machine), but none that allowed heterogeneous access. In the beginning it was comprised of only Government agencies, government contractors, and universities doing government research. Since each manufacturer used a unique and proprietary operating system, a standard was required for communications between them which resulted in TCP/IP. This now allowed users to access resources provided by others, and to communicate in real-time via e-mail. Seemingly overnight, the most important technology related to computing became the Network.

The Network is King

UNIX workstations and PCs enabled “smart terminals” (or “Fat Clients”), which had the ability to process locally, not just interact with the mainframe as did the terminals. Once these workstations were connected together, they formed a local area network (LAN). LANs made available resources in a shared manner (virtual), as well as enabled collaboration among end users - productivity soared. While this was realized for the most part in the 1980’s, LANs didn’t reach their potential until a bit later.

LAN proliferation was hampered by incompatible physical and network layer implementations, and confusion over how best to share resources. Each vendor had its own network cards, cabling, protocol, and network operating system. There was SNA, Arcnet, DECnet, Ethernet, IPX, AppleTalk, Token ring, TCP/IP, NBF and others. TCP/IP and Ethernet have now almost completely replaced them all. The agent of change was Novell Netware which made networking PCs productive. Netware provided consistent operation for the 40 or so competing card/cable types, and a more stable operating system than most of its competitors. It dominated the PC LAN business from early 1983 until the mid 1990s when Microsoft finally entered the market with a (semi) stable product that was network capable. During this same era, Unix workstations from Sun, HP, SGI, Intergraph, and others were developed for high end requirements (engineering, the space program, etc) which used TCP/IP, and many believe this was the driving factor which led Microsoft to choose TCP/IP for it’s implementation.

Still, the primary technology focus was on the Network well into the early ‘90s, and many large development projects were run from that point of view. In hindsight this seems ill founded as the early LAN infrastructure and the PCs being used as workstations were unstable at best, and the only server with enough horsepower for large datasets was the mainframe (which couldn’t connect to most LANs). Most of the time, if something was broken, it was network related. Systems management now became a different science, as the tight controls available within the mainframe were not possible on widely distributed resources. LANs were loosely controlled and were propagating at an incredible pace, becoming more and more complex by the day. Capacity planning was practically synonymous with bandwidth utilization, and much education was devoted to the detailed understanding of bridging, routing, and switching.

This would all change abruptly with the advent of UNIX based Symmetric Multi-processors (SMP) and RISC. UNIX would now surpass the mainframe in raw compute capacity at a much lower cost, without the rigid constraints of the mainframe world.

The Server is King

With the onset of SMP UNIX servers, the mainframe was pre-maturely declared dead (due mostly to the hubris of the UNIX Systems Engineers who had displaced many OS 390s and assumed this would continue until all would be UNIX). Capacity, change and configuration management processes, largely developed in the tightly controlled environment of the mainframe, would have to adapt to a world prone to chaos instead of consistency. Capacity planning now became: if you run out of horsepower, get a bigger (or faster) server. Configuration management was limited to ensuring the components that were installed in a server would actually work in that particular model, and change management was an idea something akin to an urban legend.

The administrators of the “Wild West” of computing as it was called held the rigid discipline of the mainframe world in contempt. The very title of this technology, “open systems”, reflects the lack of controls. They scoffed at change control and frequently made changes to production systems on the fly (often as root). They were also quite familiar with catastrophic outcomes from minor changes. Something had to give, and this led to creation of process management standards generally accepted as the ISO9000 guidelines (now ITIL). Unfortunately, these processes were developed for ideal situations and very few companies could implement them without severely hampering the productivity of their IT operations. Consequently, they were followed loosely if at all in most organizations into the mid 90’s.

Ironically, it was not the process engineers from the big five pushing ISO9000 adherence, nor an awakening of systems administrators that caused the next major shift, it was the belief that the most valuable corporate asset was its data. This message resonated with the business owners and replaced the speed race for bigger and faster servers with a focus on storing, retrieving, and most importantly protecting data as reliably as possible. It produced a “speeds and feeds” race of its own, but most importantly, it restored the processes of capacity, change and configuration management.

The Storage is King

Storage technology could replicate data over long distances for recovery in the event of disaster. So companies now began to focus on application availability (from the user standpoint), and recovery from failures. This view greatly enhanced the perception of the “standard processes” and ITIL took over the ISO9000 arena.

UNIX servers had become so large and powerful they began to implement mainframe-like technology (partitions, resource management supervisors, etc.). You simply could no longer find applications that required an entire enterprise class server. Servers now had to be shared with multiple applications. Now, for the first time in a decade, the majority of administrators welcomed control over changes because a failure caused great business impact (and would often cost them their job). Capacity, change and configuration management processes returned and were adopted in nearly all large environments.

At the turn of the millennium we had a side note. This did not equate to a major change overall, it simply modified the open systems world found on the raised floor. Prior to this time, Windows was not allowed on the raised floor as it was far too unreliable. While still unreliable (but less so at this point), Windows 2000 servers began to find their way onto the raised floor of the data center. This was due to economics – they were simply so cheap and easy to deploy that many business opportunities could now be pursued at a very low cost. Thus, they began to multiply like the Tribbles from the original Star Trek series who seemed to endear themselves to every sentient race which encountered them (except Klingons and Romulans of course).

There were two fundamental problems caused by this great proliferation. The first was their shear numbers. They proliferated faster than their predecessor UNIX (which I should note grew at a relative pace even faster than windows compared to its predecessor the mainframe) and the physical management of them became challenging. Then as windows began to mature and stabilize, and the x86 servers they were running on became significantly more powerful, another problem arose – Windows could not use the power the servers now possessed. Most applications were using a small fraction of the available capacity of the server. But unlike its predecessors, it did not have the capability to manage multiple applications on a single system, or to partition the servers into multiple “virtual” servers as UNIX and the mainframe did. For all intents and purposes, it was a one to one relationship between an application and a server. VMWare addressed this by bringing partitions to x86. Initially, VMWare didn’t change the way capacity, change and configuration management were done. But it opened the thinking of the architects and engineers to the concept of virtual machines, and more importantly application mobility, and prepared the IT world for commoditization.

VMWare – the Mainframe to x86

VMWare originally brought partitioning to the x86 world. Although ground breaking in this arena, it was old hat on UNIX and the mainframe. But because you could get the tribbles under control, it began sweeping the datacenters by storm. But VMWare brought more… it brought the concept of application mobility through VMotion. Through this marvelous innovation, you could move partitions (VMs) from one node to another within an ESX cluster - while running. This flexibility brings with it many capabilities – load balancing and fault recovery to name a few, but most significantly, x86 infrastructure had become a commodity (even though many haven’t realized it yet). What VMWare brought to x86, my company and others are bringing to operations in general. Everything required to run an application will become part of the application configuration, and will be managed (regardless of its technology domain) as it relates to the application. This will eliminate the tie between specific infrastructure components supporting an application and the application itself, allowing an application to use any available resource as needed. This will also cause the technology domains (server, network and storage) to “homogenize”, increasing interoperability and simplifying management. As these technologies mature and become pervasive, a major shift of control will occur, out of the data center and into the business itself.

The User is King

Welcome to the near future (and soon present) reality of the business owner managing his own environment. The user will be able to add resources (with an associated cost) to his environment when he needs to increase performance. He will add storage when he begins to run low. He will move historic data to archive storage as it ages (to reduce cost), and he will make these allocations from his desk because all of the processing resources will be shared. They will be added automatically based on business rules and policies that he will set in advance, or they will be added ad hoc when the need arises. This sounds far fetched, but how many of us ran a LAN in the late ‘80s or early ‘90s – these environments were fragile, complicated, expensive and hard to keep up. Most companies had several different LANs in different areas of their business (which couldn’t talk to each other) and large, very highly skilled engineering teams to support them.

This is certainly not the case today. Networks are already shared across the user base (the internet is shared across the world). Enterprise class servers simultaneously run many large applications. In fact, they need to be partitioned since very few applications (if any at all) can individually use all their capacity. Storage has been shared since the advent of the large capacity storage arrays in the mid ‘90s, and has further been shareable through the use of SAN, NAS and SVCs. As the technology matures and the distinction between vendors blurs, the data center will become homogenous within each technology domain. It will then be possible (and desirable) to turn control of the assets an application uses to the application owner.

What about the 3 C’s?

Capacity, change, and configuration management have been the fundamental core processes that ITIL and Six Sigma have used for years to evaluate the health of an IT operation’s processes. As an Operations Management Consultant I spent several years doing just that for many fortune 100 companies. But these were, for the most part, cumbersome controls put on IT to prevent it from hurting itself and the business. Now - with data center virtualization - capacity planning can be done in a reactionary manner (as you reach capacity, simply add a node). Change management will be accomplished in a virtual world without impacting production which will be set up and torn down by the users themselves, virtually replaced by Q/A testing. As for configuration management, what configuration will you be managing? No longer will it be hardware components and their configuration as each domain will be homogeneous. Resources will simply be added to the universal network(s) as needed.

Capacity management will be automatic
Configuration management will manage the configuration of the application
Change management will all but disappear into Q/A

The good news in the end

These processes were developed as a reaction to an environment which was basically out of control. These did not enhance operations, rather they were impediments. As IT operations have evolved from the mainframe to the network, server and storage eras, we have eventually come back to where we started, managing the resources as a pool (as we did in the mainframe era, sharing resources as the application needs require).

The data center will be become one big logical mainframe with the engineers working on it from the inside!


Business owners used to be constrained by rigid technology resource configurations and requirements, forcing the business owners to adapt to the technology. Over time, as data center virtualization technology becomes mainstream, more and more flexibility will be offered to end users to accomplish business objectives in real time. Whether it’s VMware or DynaCenter or any of the other tools which control application mobility (resource reconfiguration), positive change is taking place. Constraints and complexity are moving toward dynamic and nimble configuration management. In the near future, data center configuration management is a flexible and nimble science. System Administrators’ hands are no longer tied by burdensome infrastructure requirements and businesses will be enabled to respond quickly to rapidly changing market forces.


Page 1 of 6