High Availability: People and Processes    



Network availability directly impacts productivity; and is especially true of e-commerce. This article covers a range of areas focusing on improved uptime through internal operations and will provide you with the tools you’ll need to bring your network to a practical state of always on. This is the second in a series of articles on High Availability (HA).

HA environments are complex, more so than environments which don’t depend critically on data flow. Data connectivity has become a utility like electricity, sewage or water. Unlike other utilities, with data connectivity, one hour of unscheduled network downtime will have a greater business impact than an hour of washroom unavailability.

If you are interested in HA, you are looking to provide Enterprise-grade services regardless of the size of your organization.

The SLA: Measuring and managing availability
The crucial part of providing high availability environments is the governing Service Level Agreement (SLA). SLAs are found between clients and service providers and can also be used internally between departments. With a vendor-client relationship, the SLA manages expectations and provides clarity and limits between provider and client.

Internally, the SLA is often ‘understood’ but not written down. It reflects what one department wants, and perhaps not what IT can deliver. Developing a written SLA for internal use creates measurable success and failure rates. All stakeholders can see where improvements can be made allowing for better budgeting and planning.

Not all applications require high availability. Only applications which are mission critical should be the initial focus for this level of availability.

Steps towards HA
HA does not mean that there will be no downtime. For any given piece of equipment, HA means having control over uptime of the services it supports. Note the focus switches from downtime to uptime, and then only required uptime. This ensures that when needed, the services and data are available.

Roughly two decades ago, before the availability of the public Internet, typical uptime for devices and services was measured in month and years. Now that same sort of uptime expectation will leave your network vulnerable, especially with Microsoft Windows environments.

Documentation
All of the people participating in looking after the HA environment should participate in maintaining documentation. Current and accurate documentation leads to a more comprehensive understanding of the HA environment.

It is important to document internal and external outages. These documents can be used to mitigate future incidents and improve availability where most needed.

Maintenance windows
Maintenance windows are essential to ensure required uptime commitments are met, and to ensure security of the environment as a whole. Designating maintenance windows allows the IT team to perform scheduled upgrades, critical reboots and other maintenance activities at a time when the impact will be minimal to the business.

Log reviews
Regular reviews of traffic loads and event logs allow for capacity planning and mitigating small issues before they grow into larger problems. Looking at the traffic loads also determines when your maintenance windows should be scheduled.

Testing
Testing is a must. You must know that everything is functioning according to your plans, processes and documentation. Have a list of services with known results for what is supposed to work and not. This ensures services availability while keeping security in place.

Incident response
Incidents happen even in the best maintained environments. Rapid and effective response is the first order of business. Record the problem, the services affected, the time frame and the resolution, in plain language. When something does come up which impacts against the business goals it must be documented, as this allows you to understand what happened and to support management decisions on how best to mitigate the risk of recurrence.

Control and understanding of your environment promotes availability. Proper management of the environment through process driven tasks, documentation and testing, high availability can be maintained.

Personnel
Those charged with the care and management of HA systems must remain aware of environmental details: as what is often seen as inconsequential can often bring down a network.

In order to ensure the stability of the HA environment, each IT professional should follow documented procedures. New ideas should never be executed without due consideration. The IT team must discuss, receive approval, execute, test, and then document.

Staffing your IT environment might require the external consultant, someone who can provide occasional expertise not available to the internal team. Consultants often play a role in the deployment of HA environment, leaving day-to-day operations to in-house staff. They provide domain specific skills and expertise. At the end of such an engagement, knowledge transfer and good documentation are required. And the work performed by the consultant ought to follow the organization’s process of discussion, approval, execution, testing and documentation.

Conclusion
I have often said, “Don’t mess with the process. If it’s wrong, fix it. Then leave it alone and follow it.” This is the dogma of HA. Through proper implementation of well-documented processes by careful and well-trained personnel, your services and data will be available when needed.

Originally published Jan, 2010

PDF this Page
Fragment - Current Release


Articles
Administration

IT Roles and Responsibilities
App_Sec
BCP STATS
On Passwords
Spending Enough
Planning to Fail
Living With the Enemy
A Reason for Policy
Mission Critical Messaging – Do you have a policy
Globalizing the SMB
High Availability: People and Processes
Case for Project Management
Risk Management
Networking

On Routing
VLAN Tutorial
IPs 4 Golden Rules
WAN Technology primer
DHCP Primer
Your Head in the Cloud(s)
DNS: Terms and Process
VPN Surfing Challenge
Network Slowdown
Importance of Time
High Availability: Technologies
Security

Spammers Go Full Circle
Beyond the Lock
The Guardian at the Gate
A Web of Trust
Data Breach Notification
Misc

Electricity Primer
Documentation-101
Data Control
Open Source in the Enterprise
Closing the Loop
Helping IT to help you
Your ICT Keystone

eSubnet Services

Contact us regarding your network,
security and Internet services needs




All content © eSubnet 2003-2017
ESUBNET ENTERPRISES INC. TORONTO CANADA