This overview synthesizes the key concepts regarding enhancing Information Technology and Communications (ICT) resilience, focusing heavily on the philosophy of redundancy, the mathematical justification for duplication, and the vital role of Software-Defined Wide Area Network (SD-WAN) technology.
The Core Philosophy of Resilience & Redundancy
The fundamental principle for achieving high continuity and resilience in ICT systems is duplication, often referred to as redundancy and fault tolerance. The benefits of implementing secondary, active systems are described as being exponentially beneficial and “thousands of times better” than simply doubling capacity. This design philosophy is commonly used in critical infrastructure, such as jet planes, which rely on dual components (like two engines) to ensure safety and reliable operation. The fault tolerance mechanism ensures that if a primary component fails (like an engine or a network link), the system automatically utilizes the remaining component, allowing operations to continue safely. A skilled engineer can achieve high network reliability by employing engineering resilience and smart design, even when using less reliable or less expensive components. This approach not only saves costs but also enhances overall performance.
The Quantitative Case for Duplication
The sources provide a strong mathematical argument for favoring system redundancy over simply using higher-quality components.
The Uptime Conundrum: If a system relies on 20 components, each having an uptime of 99.9%, one component will be down 2% of the time, resulting in just over a week of downtime per year.
Improving Component Reliability: If components ten times more reliable are used, the downtime is reduced to 0.2%, which still translates to half a day of downtime per year.
Implementing Redundancy: By contrast, if redundant secondary components (a mirror of the system) are installed, even if they are only 99.9% reliable, the chance of both the primary and secondary component fulfilling the same function failing simultaneously is only 0.01%. This dramatically reduces downtime to only half a minute per year.
Therefore, duplicating components and systems is significantly more effective at improving continuity than solely engineering a single component to be more reliable.
Implementing Redundancy in ICT Infrastructure
To make informed decisions regarding redundancy, it is necessary to analyze data center components, assess the likelihood of failure, and determine the impact and associated cost of potential outages on services and customers.
Key mechanisms to improve continuity within the data center include:
Redundant Data Centers: Using a secondary, diverse data center is crucial. Ideally, this center should be in a different geographic location or separated by a suitable distance within the same city to mitigate natural and economical challenges, ensuring differentiation on power grids, for example.
Power and Storage: Implementing redundant or resilient power sources, such as a server with two power supplies connected to alternative A and B power feeds. This protects against both the power supply failing and the electrical power feed failing. Additionally, using RAID storage is recommended.
Application and Database Clusters: Provisioning application and database clusters across both primary and secondary data centers.
Inter-DC Connectivity: Connecting data center locations using a fibre ring that features path protection and automated failure detection.
Failure mitigation solutions can be classified as:
Hot Solutions (Resilience): Provide immediate failover to an alternative component (e.g., a UPS).
Warm Solutions (Redundancy): Involve the recovery of an alternative component within a specific restart time (e.g., a standby generator).
SD-WAN as a Key Enabler of Resilience
SD-WAN (Software-Defined Wide Area Network) plays a critical role in facilitating network resilience, particularly in the “last mile” connectivity.
SD-WAN Last Mile Redundancy:
SD-WAN utilizes dual last mile links, operating either as primary/secondary channels or actively together (like Nepean Networks’ “jet plane mode”).
This approach ensures continuous network operation even if one link experiences issues or downtime.
SD-WAN solutions, such as those from Fusion Broadband South Africa, simplify the complex implementation of redundancy by seamlessly managing circuit switching with a “Look Ma, No Hands!” approach, eliminating the need for manual intervention.
This automatic failover capability makes using less expensive broadband circuits viable over higher-priced MPLS circuits.
Benefits of Dual Links with SD-WAN:
Implementing SD-WAN with dual last mile links provides several benefits:
Fault Tolerance: Continuous operation is maintained even if one link fails.
Automatic Failover: Traffic is instantly rerouted to the functional link upon failure, reducing downtime.
Aggregation & Bonding: Network traffic is intelligently distributed across both links, optimizing bandwidth and improving performance.
Improved Reliability: The overall reliability of the network is increased, minimizing the impact of potential outages.
For instance, when two typical last mile chains are aggregated by Fusion SD-WAN, the resulting uptime of the associated site increases from and example of 87% (for a single chain) to 98%. Furthermore, Nepean Network’s SD-WAN solution dramatically reduces statistical downtime when integrated with resilient links.
Ronald Bartels | LinkedIn | Instagram











