Active vs Standy - Different Approaches for Failover
This week, as part of our ongoing series on network visibility, I’m going to share some information about failover options. I recently came across a phrase that must be as old as networking itself: Let it fail, but fix it fast. Exactly. This is the premise behind configuring your visibility architecture for high availability.
Purpose of Active and Standby
High availability (HA) is achieved through the presence of redundant devices that can back each other up in the event one of them fails. The earliest implementations of high availability focused on having a second device in standby mode, sort of like an understudy waiting to step in should the principal dancer be unable to perform. While more than two systems can be involved, the most common implementation is still a primary and secondary node.
The Active/Standby mode of operation (A/S) evolved from initially having the hardware available and connected into the network, to having the necessary software completely installed and available on the second node as well. This is sometimes referred to as a ‘warm standby’ or ‘warm spare.’ The next level up is not only having the software installed, but also having the data stored in a way that it is immediately available to the secondary node as well. This is referred to as a ‘hot standby.’ The cost of a hot standby is a bit more, but the cost of downtime to an organization can be substantial, so most standby configurations today are of the ‘hot’ variety. Most hot standby implementations recover in some fraction of a minute or seconds.
Modern environments, however, don’t tolerate delay very well—even if it’s just seconds. Customers, clients, employees, and partners demand better performance. This has increased the interest in having the secondary node move from being a standby to being an active participant in normal operations, to effectively eliminate any delay in recovery. It’s like having the understudy on stage mirroring all of the movements of the principal dancer. In IT you see this referred to as Active/Active mode (A/A).
Visibility-Related Use Cases
Creating resiliency is a part of every IT implementation. With the growth of cyberattacks and the rising cost of data breaches, ensuring the resiliency of network security is more important than ever and a key use of A/A and A/S configurations.
Network Paths: Good network design involves creating redundant paths to maintain operations in the event of a path failure. Alternate paths can be configured to provide visibility to traffic in normal operations (active mode) or to begin operating only if the primary path stops or slows to an unacceptable rate (standby mode). All network paths, whether in A/S or A/A, must have visibility solutions attached, to ensure there are no blind spots where attacks could pass through unnoticed.
Security Monitoring Tools: Many security monitoring tools and appliances are configured in pairs to achieve HA and minimize downtime in the event of a hardware failure. Tool vendors generally require only a single license to operate an HA pair; the second node is deployed as a ‘warm standby’ that is activated only in the event the primary device goes offline. When that occurs, an alarm is activated to alert security personnel to the failover.
Security Fabric with Network Packet Brokers: Redundancy is also key in a security fabric design, where network packet brokers (NPBs) filter packets and perform preliminary processing of live traffic before passing to the security tools. NPBs keep security tools—many of which are major capital investments—working efficiently and perform critical tasks such as decryption and load balancing. Ixia is the only vendor that offers NPBs with the capability of being configured in A/A mode, using a dedicated HA link for complete synchronization. Both nodes are actively working and each node is aware of the traffic being processed on the other at all times. In the event of an NPB failure, recovery is instantaneous, with near-zero packet loss.
Considerations When Evaluating Failover Options
When evaluating your options for failover in your network visibility and security architecture, consider the following:
- Speed of Recovery: As mentioned earlier, if the speed of recovery in a failover situation is key, an A/A configuration provides the fastest possible recovery. Both NPBs are actively engaged in traffic processing and there is no time required to transfer processing to the standby node. This can be key to inline security processing, where tools are actively inspecting traffic in real-time. Other applications—such as out-of-band traffic analysis—can accommodate more latency in failover and may be served well enough by an A/S configuration.
- Tolerance for Risk: An A/S configuration introduces some additional risk that the standby device has failed silently and will not be able to take over processing in the event the primary device fails. In highly-sensitive environments, this could be an important reason to operate in A/A mode.
- Budget Restrictions: The cost of redundant architecture can be a substantial portion of the overall security budget. Many organizations have found it easier to justify the cost of a redundant NPB if it can be put to use immediately in normal operations as an active node. It is the duty of the security team, however, to make sure usage of the two NPBs combined do not exceed 50% of total, because one NPB must be able to completely handle all of the traffic, in order to provide complete failover.
- Maintenance without Disruption: In an A/A configuration, maintenance can be completed without any special scheduling and without disruption to the network. Traffic is temporarily routed through the backup devices with no noticeable impact to users. In fact, periodic maintenance proves that the existing configuration is still able to handle the volume of traffic being processed. Any slowness experienced during a maintenance activity would indicate the need to scale the system.
- Port Redundancy: The chance of a single port on an NPB failing is much greater than the entire device. Therefore, even in an A/S configuration, you may want to wire your inline tools to the Active NPB on redundant ports, so the secondary port can take over in the event the primary port fails, preventing a complete device failover. This is less important in an A/A configuration since the workload is shared.
Summary
Whatever failover mode works best for your environment, make it a priority. Lack of visibility in a complex system puts the entire infrastructure at risk. You want to be able to intervene quickly, accurately, and effectively to protect your network.
Ixia’s entire series of blogs on visibility are available now in the e-book Visibility Architectures: The ABCs of Network Visibility.