Shelley Bhalla is Director of Product Line Management for Blue Planet Analytics & Assurance at Ciena’s Blue Planet software division. His primary responsibilities include defining and executing the vision and strategy for Ciena’s Blue Planet Analytics and Assurance software. His portfolio focuses on driving network automation through the use of Artificial Intelligence (AI) using Machine Learning (ML) technologies.  As a blogger and author with extensive industry experience in IT and telecom , Shelley has covered various topics including network operations, Advance Analytics using Artificial intelligence and machine learning.

On March 26, 2019, many airlines tweeted that their main reservation systems were having "system issues" and were unable to issue boarding passes. This was a U.S.-wide outage that impacted hundreds of thousands of passengers and the scene below from one of the airports illustrates a frustrating customer experience most anyone can relate to.

Image Courtesy: https://techcrunch.com/2019/03/26/us-airlines-computer-issues/

Every industry inevitably experiences network issues and outages, but in today’s deeply connected social world, a disruption in service severely impacts a company’s brand value and reputation. Service providers understand this and are focusing on using automation to quickly identify root causes and fixes to such issues.

But is automation enough for meaningful digital transformation?

To reduce operating expenses and address the complexity resulting from incorporating newer technologies, service providers must embrace a fundamental shift in ideology to focus on solving problems proactively, before they happen. This won’t happen overnight; it’s more of a journey that starts with a keen focus on solving problems quickly using analytics and automation. As time progresses, the power of artificial intelligence (AI) can then help predict and avoid these issues before they impact services.

Many leading service providers are already concluding that automation is not enough to drive complete digital transformation. Complex decision making at super-human speeds requires intelligent automation, machine learning and AI, all of which are fundamental for controlling and operating communications networks of the future.

Let’s look at why.

Operational complexity leads to reactive operations

The way people consume products and services is constantly evolving, placing ongoing pressure on service providers to update their portfolio and business models to meet subscriber demand, all while reducing costs. Recent surveys, such as one completed by IHS Markit, suggest enhancing customer experience is a top digital transformation driver for service providers as new technologies take shape over the next few years.

Source: IHS Markit (now part of Informa Tech), Service Provider Digital Transformation Technologies Service Provider Survey, January 23, 2019.

Looking closer, as new technologies such as IoT and 5G are introduced, service providers must add more features to their networks. This adds to network and service complexity and also results in higher operational costs. Moreover, the exponential traffic growth is also straining network operations and IT systems. In some cases, service providers report their cost per bit is forecasted to exceed revenue per bit, forcing them to re-evaluate their operational expenses to remain competitive. Cumulatively, this complexity constrains service innovation and increases the total cost of ownership (TCO) for service providers.

Today’s typical service provider network is a combination of many small networks that have been built and added to over time, not necessarily from the same vendor. Each OEM vendor product offers unique product-management capabilities, but seldom support or operate other vendor’s products. This results in many small, monolithic network operation applications focusing on unique areas of the network. These silos create their own reservoirs of rich data that normally don’t talk to each other, resulting in network operations engineers trying to leverage data from multiple silos to resolve multi-vendor, multi-layer issues.

Capturing and correlating information from these various sources takes time and effort, and even more time is spend triaging and troubleshooting issues. The reactive break/fix models keep operations engineers busy solving issues, many of which recur time and time again. The focus is on reactive support rather than enhancing the software to suit changing needs or optimizing the network to prevent issues from happening in the first place.

Step one on the road to recovery: automation

Automation is the process of taking a repetitive task and building scripts to automate the steps required to reach desired goals. For example, for network troubleshooting this means automating finding the root cause of an issue, or once the root cause is found, automating the changes to the network that will resolve the issue. This all sounds great, but there are limitations to automation and to understand why, let’s step back and talk about how network operations work.

Any network, large or small, produces lots of logs, events, and performance metrics. While this data is good, it requires human intellect and skill to identify its relevance to a network issue that is happening or is about to happen. To make this determination, event management systems must ingest all data and use rules to correlate critical logs to the specific issues that need to be resolved by identifying their root causes. And once they are identified, specific actions are then executed to remediate the issue.

While the rule making process remains pretty much manual, network automation is already part of this process. To resolve issues faster, network operators are using their years of knowledge to create scripts that automate the troubleshooting process and identify the root cause. This is called Runbook automation. And while Runbook automation can resolve some of the well-known issues, it is not used at scale for remediation.

Step two: bringing intelligence to automation

What if we could create a network that assembles itself using high level instructions and reassembles itself as requirements change? What if it could automatically discover when something goes wrong and fix a problem or explain why it can’t be resolved? This is truly the way networks ought to be managed and operated to meet the demands of the highly complex next-generation services that will run on increasingly complex networks. Doing so requires adding AI and machine learning (ML) to network automation.

AI is the ability for the network to perform tasks normally performed by humans that involve perception, understanding, and decision-making, and ML is a form of AI that develops applications that use statistical techniques to enable systems to ‘self-learn’ and progressively improve performance on a specific task—to be able to think like a human but at superhuman speed, scale, and accuracy, and then take action.

Recent surveys done by Tractica suggest the telecom industry is already in the process of investing in AI and automation, and will continue to exponentially invest in areas of businesses where they find the most return in value. The message is loud and clear: the front runner, garnering almost 61% of all investments (and with no other coming close) is "Network/IT operations monitoring and management".

Source: (04-2018) https://www.tractica.com/newsroom/press-releases/telecommunications-industry-investment-in-artificial-intelligence-software-hardware-and-services-will-reach-36-7-billion-annually-by-2025/

Invest smartly in automation and AI

Rapidly evolving consumer demand, the need to add new services to keep subscribers happy, and legacy infrastructure that has been bolted on over the years from multiple vendors, has created networks that are more prone to issues, which are in turn harder to find the root cause for and resolve.

Automation will address some of the issues by removing the need to perform some of the repetitive tasks required to determine and fix network issues, but they can only help once an incident has taken place. To truly keep subscribers happy, service providers need to uncover and fix issues before they occur, which requires investing in AI to augment automation solutions. This will help to avoid churn, reduce costs, and improve brand reputation – a smart choice all around.