Any of us who lives in a large metropolitan area has been faced with this challenge – we just ended a day of long meetings and have to drive our rental car to the airport in rush-hour traffic to catch our flight home. Naturally our meetings ran late, and we didn’t leave enough buffer time to account for an accident on the freeway.  Who do we turn to?

We turn to Waze, the GPS navigation software app, to identify in real-time what is the fastest route to the airport.  If traffic conditions change, Waze automatically proposes a new route to get to our destination in the fastest time.  And we manage to make our flight just as the boarding door is closing.

Utility IP/MPLS networks face a harsher reality. In the world of “five nines” reliability mandated for electric utilities network continuity is all-important.  Global and national power grids depend on teleprotection and SCADA sensors to provide real-time data on the state of the power grid. This data is critical to pinpointing trouble spots on the grid and enabling relays to switch off power lines that could destabilize the grid and propagate widespread outages, such as the event which occurred in 2003 that knocked out power to 50 million customers in the Northeast US over two days.  This sensor data must be delivered with the lowest latency and switchover times, otherwise power issues are missed causing power outages.

In the same way that road warriors need to get to the airport on time to catch their flights home, teleprotection/SCADA data must be transported within strict latency limits to enable the power grid to self-heal.

Many utilities have used IP/MPLS networks to transport this critical teleprotection/SCADA data between their substations and control centers. In the same way that road warriors need to get to the airport on time to catch their flights home, teleprotection/SCADA data must be transported within strict latency limits to enable the power grid to self-heal.  Due to the dynamic nature of IP/MPLS networks, traffic congestion can occur at any time impacting the ability to carry this mission critical traffic within desired latency specifications.  The sheer size and complexity of utility grid IP networks means that traditional network management systems aren’t up to the task.

Real-world Problems

There are countless real-world examples of utility application and service problems stemming from misunderstandings of the way large IP network routing and traffic logically operate: 

Logical network misconfigurations: A power utility’s two adjacent campuses were exchanging traffic via a low-speed WAN link due to a misconfiguration, resulting in degraded application performance. 

Compromised redundancy: An expensive backup WAN link to a site was deployed, only to discover when the primary link failed that the backup wasn’t correctly configured to carry the traffic. 

Security breaches: A utility was blind to a backdoor into its network through a contractor’s network. 

Loss of application data: An electric utility lost critical connectivity to its power grid control data due to a routing misconfiguration that was seen only when a routine maintenance operation caused control data to be lost. 

Degraded services: A service provider failed to detect the root cause – routing instabilities - of weeks of intermittent service outages at a utility customer’s site. A regional utility operator integrating TDM systems with its new IP/MPLS network faced challenges because the line protection application used for monitoring AC current at substations was highly sensitive to delay and jitter on RSVP-TE tunnels. 

The common thread among all these examples is that the problems involve the routing logic in the network, rather than the status of individual devices. Traditional device management and end-to-end application performance management solutions, while necessary, provide no insight into the logical operation of traffic and routing. As a result, utility IT departments often have no visibility into the root causes of application degradations. For enterprises with less critical applications, this lack of visibility may not matter as much, but for utility grids, there is no room for error.

Leveraging data-driven analytics

Just as when we rely on Waze to tell us the fastest and most efficient route to get us where we are going, Blue Planet a division of Ciena leverages data-driven analytics to provide utilities with real-time visibility into their IP/MPLS networks and help identify required changes that can optimize network traffic flows.  Route analytics technology, used by many large enterprises, network operators, service providers and government agencies, is the answer to understanding the logical operation of IP networks.

Route analytics works by using a virtual network appliance running specialized software – that acts like a passive router, but actively listens to routing protocol updates sent by all routers in the network and computing the network-wide routing state in real time.

Route Optimization and Assurance (ROA) Explorer Suite, from Blue Planet has been widely deployed by network teams across the globe responsible for maintaining availability and performance of critical SCADA applications, including:

  • National and regional electrical power utilities
  • Private and public transportation and logistics organizations
  • Large municipal, IP network-based traffic engineering systems 

In these examples, IT departments use ROA Explorer Suite’s routing and traffic analytics to ensure that the IP network supporting utility SCADA applications is always available. Three major benefits related to SCADA assurance are: 

1. Engineers can simulate a variety of changes with high accuracy

For example, utility network engineers can model the anticipated increase in high-priority SCADA data traffic resulting from the deployment of new RTUs. The simulated new traffic is overlaid not on an abstract model, but on the network’s traffic and routing matrix in real-time or at a specific time (e.g., peak usage period) chosen by network engineers. The new traffic and routing picture shows whether the CoS of the new SCADA traffic or any other traffic class is affected on any link in the core IP network. If not, and provided usage assumptions are correct, engineers can proceed with confidence in the roll-out. 

2. Add a DVR like feature to your network, reducing troubleshooting time and increased network service quality

Due to the lack of visibility into logical network operations and a dearth of forensic troubleshooting data, many application problems—particularly intermittent issues—go unsolved, falling into the “No Problem Found” bucket. With combined route and traffic analytics, utility engineers can rewind the recorded routing and traffic state to the time the problem occurred and quickly localize the problem domain by tracing the route/path that a particular service traveled across the network. They can easily determine if there was a routing root cause. If not, they can see if any link, the SCADA application traffic, or a relevant CoS breached volume thresholds. If there was link congestion, further analysis will show whether a routing issue elsewhere caused traffic to shift, or if unexpected traffic was present, including its origin, destination and the route that included the problem link.

3. Network Continuity Assurance

Network engineers can view traffic flow utilization trends for all the links in their network, broken out by CoS or even by application groups if they can be classified using flow analytics information. Easy-to-use trending reports and utilization projections identify links or classes of service that will experience congestion if current trends persist.

The insights utilities need to succeed

To succeed in utility environments where large, complex IP networks create variability in how application traffic is delivered, network managers need to look beyond traditional network management tools and understand their networks’ control plane operations. With real-time routing telemetry overlaid with traffic flow data, back-in-time forensics, and what-if modeling capabilities, utility network managers gain the visibility they need to ensure that SCADA applications operate with the highest-possible network service continuity.