Server Redundancy Keeps Your Company Healthy, Wealthy and Wise

August 30th, 2012
By Tony Marchese

Benjamin Franklin coined the phrase, “Healthy and Wealthy and Wise.” We all want this, especially health because without health, the other two don’t matter as much.

When we’re healthy, we tend to forget how miserable we felt the last time we had a problem. It feels so great to be healthy, that once we overcome a problem, we eagerly shift focus to “Wealthy and Wise.”

This same human nature applies to how we take care of our computers.

It feels great when all systems are humming and the computers are responding instantly. In today’s world, this is a given. How frustrating is it, as a customer, to enter a web order with long delays between pages, or hear a CSR say, “Our computers are slow today?” Delays should be unacceptable, “downtime” is catastrophic, and just the thought of losing any data is unthinkable.

How do we keep our systems as consistently “Healthy” as possible, so that our business can focus on its goals?

First, servers need to stay humming

I.T. started making servers more reliable by doubling up the weakest point of failure. Disk drives are like incandescent light bulbs; they eventually burn out. These days,we set them up in groups called “RAID arrays.”

On a SQL Server, the data should be stored on a RAID 10 array, similar to the groups of tires on an eighteen wheeler. This distributes the weight, and the truck can keep going if one of the tires blows (to be replaced it as soon as possible).

Another way to increase server reliability is to mix disks from different manufacturers or different lot numbers. If one disk has a design flaw, they don’t all burn out at once. The “claimed” average life of a disk drive used to be 12 years, and now they are claiming as much as 57 years for consumer drives and 171 years for server grade drives. In real life, drives can last 15 minutes to 15 years.

In every 10 servers that are several years old, you may typically find one or two servers that have a failed drive in a group that needs to be replaced. The server is still functioning so often nobody is even monitoring for this.

In an important server with many drives, it’s a good idea to add an extra drive as a true “hot spare” that sits in waiting 24x7x365, and then automatically joins whatever group has a failure.

Think about it: your system is replacing the flat tire without even stopping the truck! Most servers can do this today, if you plan ahead and set it up this way. If they’re big enough, you can even use retired drives for this.

Second: Double up the entire server

If any other components on the server fail, another server can take over. Obviously now, instead of doubling the cost of your disk drives, you need to also double the cost of the entire server, and do additional set up so that the second server always has a fresh copy of what is on the first server.

For Web Servers and Terminal Servers, Windows comes with the ability to set up Server groups that can work together to “share the load.” If one fails, the others in the group need to work harder but they can keep things going.

Amazingly, SQL Server 2008 comes with this capability, and SQL Server 2012 improves it further to allow what they call “SQL Server Always-On Availability Groups.”

Picture a convoy of trucks. If one breaks down, they cast it aside and the next one instantly takes the load and continues the journey.

With some advance setup work from Morse Data, your InOrder application can enjoy this protection — that is, to jump to another server without anyone realizing that the main server just went up in smoke and triggered the sprinkler system.

Speaking of sprinkler systems, someone recently asked, “Why do I need still need to do backups and pay extra for disk drive groups, and now also extra servers!?”

The answer is clear if you think about it: Why do we have fire extinguishers when we have a sprinkler system? And why do we need sprinkler systems when we have fire departments and fire insurance?

Whether you’re hosting servers or running servers in house, you still need to ask your I.T. people what the plan is to keep your servers humming.

To keep our companies — and customers happy — we need all systems humming. However, we also need them to instantly respond — which will be Part II of this series.