Terms of Endearment - Part 1: Understanding High Availability

By: Sandeep April 01, 2010

Hello Everybody...so this is my first post of a multi-part series of topics covering terms that are often misused or generally confusing. There are a number of topics I want to cover at a general 101 level before looking at each in more detail during a 202 series.

So without further a dieu, those of you that have had the pleasure of working with us may already know that for some reason people confuse Vasanth for me, and vice versa. While we are both awesome people, I assure you we are in fact different people with different parents that aren't even related to each other. Never the less, some people may still expect me to respond if you call me by his name. Clearly this is some funky use of logic. Interestingly enough, that same logic seems prevalent with regard to technology terms. Say one word, mean another. This post is about the difference between Redundant, and Highly Available. Much like Vasanth and I, these terms are quite different from each other.

First lets talk about "redundancy". Redundancy almost always refers to the hardware serving up your application and at times can extend to the infrastructure surrounding it. Redundancy is a term that is often used in explaining High Availability, but just because something is redundant does not mean that it is Highly Available. The definition of redundancy is a subset of the definition for High Availability. In context, redundancy would imply that the physical hardware hosting your application includes multiple components of each type. An example of hardware redundancy for your average everyday web server would be 2 processors, 4 heat syncs, 8 system fans, 2 power inlets, 2 network cards, 2 hard drives, etc. Redundancy has applicability on various scales. Power redundancy, for example, refers to both the power supply to your server as well as the supply to the rack that holds your server. Network redundancy refers to the switch your server is attached to as well as the network interface(s) on your server itself...and so on. There are nearly a dozen other areas to discuss but I will save that deep drive for a future post.

So what does this have to do with High Availability and what does "highly available" mean? Simply put, an application that is highly available (a.k.a. HA) is one that is available to users 24 hours a day / 7 days a week with very little (if any) down time for maintenance. This implies that the hardware and infrastructure surrounding your application is redundant. That's the simple definition. In greater detail though, the term highly available means your application must be available for 90% to six nines. High Availability is usually measured as a percentage of up-time for a given system. It is calculated throughout the year and marked at year end. Each organization can make the determination as to what is and is not an acceptable percentage of up-time. Six nines is the extreme of HA. Six nines availability is exactly what it sounds like: 99.9999 % up-time. That means 2.59 seconds of downtime per month or 31.5 seconds a year. There is however a bit of a grey zone around what qualifies as downtime. Downtime refers to a period where the application is unavailable to users. However planned downtime has a different weight than unplanned downtime. Planned downtime is generally introduced to the environment for OS patches, software upgrades, and alike. Unplanned downtime is due to logical fault like an application crash, or physical fault like a damaged power supply.

HA implementations can vary greatly depending on what type of application you are deploying and it's dependencies. Often people are mislead to believe that a system design like this has lots of servers, and extra network switches, and multiple firewalls, and all sorts of other crap ... but in reality an HA system is allot simpler than that. The larger the design the more complex it gets, because each component added introduces a new potential failure. Some of the most highly available systems are simple. An example of a simple design would be a pair of single rack mount servers physically located in different locations each with redundant internal hardware such as mirrored drives, multiple NICs, multiple power supplies (all engaged), and two or more CPU's that are running processes independently. In a nutshell, that is HA for you and we're barely skimming the surface of this topic. There are a number of other areas that are closely linked to this concept such as fault tolerance and reliability, but in an effort to keep this first post as the 101 to this series, I will go into more detail on that in some future blog post.

Okay, so now that we've learned our terms, lets use it in a sentence: I know my applications are highly available because my up time is greater than 90%, my hardware is internally redundant, and it is physically located in separate locations.

I hope you’ve found this post informative, see ya next time!

Tags: Fault Tolerance, high availability, redundancey

Comments

Ganesh on April 02, 2010

vasanth, I meantt Sandeep :-), nice article. Well said, we at my workplace are moving to 10gr3 and made the system infrastructure scalable and redundant, the question of high availability is something to be seen after going live. And I know you meant ado and not adieu :-)

daraambrose on April 06, 2010

Nice article. Another distinction that is often drawn is High Availability (HA) versus Continuous Availability (CA). Although there is no formal definition, in general systems with 2 to 4 nines (99% to 99.99%) availability are considered HA and systems with 5 to 6 nines (99.999% to 99.9999%) are considered CA. Frequently different implementations are used to achieve each. HA systems are often built by clustering individual servers together while CA systems are usually built on fully fault tolerant hardware. (In the interest of full disclosure I work for Stratus Technologies, http://www.stratus.com. We are leaders in CA Hardware Fault Tolerant Servers and Software solutions for HA)

Blog