Understand why network resilience is critical in planning to prevent downtime

27 November 2017

The art and science of enterprise network resilience


From its nascent beginnings as a niche force multiplier, IT systems have dramatically evolved to become a core and inextricable part of doing business in our modern world. While outages of hours or even days were somewhat acceptable in the past, even a short downtime today is considered highly detrimental and likely to result in outright financial loss.


This means that reliability is now more important than ever, with resilience against outages becoming a key anchor when designing and engineering modern IT infrastructure.


Understanding resiliency


Despite its importance, there is some amount of confusion about resiliency and its role in pre-empting downtime, however. Resilience refers to inherent robustness that allows a system to continue running when failures occur, and is a relative term that could be attained using a variety of engineering approaches to keep systems running in an optimal fashion.


The most common strategies would probably be by overengineering systems for increased availability or by clustering them to reduce the chance for catastrophic failures. An analogy would be the use of error-correction code memory (ECC memory) or a multi-processor system within computer servers to pre-empt against limited memory and processor failures. Either approaches are cost prohibitive to implement due to the cost of components and proprietary technologies.


Another approach is to leverage redundant systems, which entails the use of duplicate or multiple systems deployed in a configuration against system failures. This is typically achieved with standard systems and intelligent design to ensure high availability; the greater the redundancy, the higher the overall system resiliency. By avoiding the use of proprietary technologies, resilience is achieved without the prohibitive costs inherent to building heightened robustness into every component of the infrastructure.


The role of network resilience


While resilience is well-understood in terms of servers and storage infrastructure, the wide-area-network (WAN) component is one that is often overlooked. This is unfortunate, for even the most well-engineered system is only as strong as its weakest link. Moreover, modern workloads tend to be highly connected by multiple networks, which makes critical network infrastructure more complex and cannot afford any downtime.


For instance, a regional organisation that is headquartered in Singapore, with large branch offices in Kuala Lumpur and Jakarta could have its enterprise resource planning (ERP) and customer relationship management (CRM) systems located within the island state and accessed through the corporate network regionally. Even for a mid-size business, an outage of half an hour can result in multiple unproductive hours as employees find themselves unable to deliver any work within the stipulated timeline.


It doesn’t help that the complexity of networks is constantly growing due to increasing scale and the incorporation of modern technologies such as software-defined networks and hybrid cloud deployments. A network outage can hence cause far greater impact than a single server or storage appliance going down – possibly even bringing business operations to a grinding halt, and is also more visible to customers and the public.


Building a resilient infrastructure


Though it is tempting to consolidate all connectivity options under the same bracket, the truth is that not all networks are the same. Some offerings are simple digital circuits with no redundancy and hence nothing in terms of resilience, others could be engineered with high resilience with redundant circuits on standby in the background. It is crucial that businesses that value reliability consider these factors right from the beginning.


Organisations with a regional or worldwide footprint will be aware of cable outages that can stem from both scheduled or unplanned events resulting from subsea cables that are damaged by accidents, storms, or natural disasters. Though there is no way to engineer greater resilience into a cable, having a robust, redundant network ensures that any outages can be quickly resolved without unnecessary disruption.


On that front, StarHub’s Global VPN (GVPN) helps enterprises to achieve flexible connections across multiple locations locally as well as globally with speedy data transmission that is scalable, yet low-cost. The service is run on top of StarHub’s own private and managed infrastructure for secure and fully managed carrier grade resilience. Customers can hence leverage the GVPN’s network to benefit from the power of a smarter and managed network.


Learn more about StarHub’s Global VPN.

New world of retail

How new technologies drive shopping behaviour.

Read more
The lifelines of your business: data centre and network

Data centre economics and operations can be transformed and dynamically enhanced.

Read more
Making the move to the cloud

Outline some tips for enterprises looking to deploy part/all of their infrastructure into the cloud. 

Read more