AI-based automated assurance is a 5G necessity
In a 5G world, automation is mandatory because it is impossible for Communications Service Providers (CSPs) to meet customer requirements for dynamic ultra-low latency and high availability services using manual provisioning and management processes. CSPs need assurance solutions that use Artificial Intelligence (AI) and machine learning to automate the trouble-to-resolve process in order to deliver the cloud-like network experiences customers want.
During Blue Planet’s recent Virtual Insiders Forum (VIF), I was joined by Appledore Research founder Patrick Kelly as we discussed the issues CSPs are facing when it comes to assuring services and why closed-loop automation is critical. Closed-loop automation continuously assesses network conditions, traffic and performance demands, and resource availability to determine the best path for traffic to take. The process, which introduces machine learning into operations, relies on analytics, policy, and orchestration to enable self-optimization.
Closed-loop automation continuously assesses network conditions, traffic and performance demands, and resource availability to determine the best path for traffic to take.
What are the challenges?
- With the introduction of 5G, edge computing and other technologies like SD-WAN, networks are becoming much more dynamic, and workloads are more distributed. This necessitates changes in how these new services and technologies are supported.
- CSPs are managing more continuous software releases in their networks and operations. As a result, interoperability issues are surfacing between different suppliers, application providers, and the underlying infrastructure.
- New technology is being added rapidly into the network. Technology such as 5G and edge cloud increases the time taken for problem resolution compared to existing technology due to new protocols and the change in network architecture it brings. It becomes impossible for operators to address issues using traditional, manually intensive processes.
- CSPs are facing a shortage of software skills, and existing staff must adapt to all the changes that are occurring. Operators normally build excess capacity into their networks for peak loads, but cloud fundamentally changes this, requiring the adoption of agile DevOps practices and rapid introduction of new services.
These changes are leading to an explosion in operational costs. To address this, CSPs need to shift traditional methods of assurance, such as root-cause analysis and event-based monitoring, to automated assurance, which uses AI to harness large data sets within the operator’s environment and then applies machine learning algorithms for continuous learning. This can help CSPs identify problems before they affect customers.
The added value that these AI and machine learning tools can provide was highlighted by Patrick’s below chart from our VIF session.
Taking steps toward automation
It is important to note that many core assurance requirements are not changing. CSPs still need to collect, correlate, and visualize network events – performance metrics, faults, and alarms, for example – and fix problems. But now they must do it at massive scale, which requires an additional level of intelligent automation that can be provided only by AI and machine learning.
Incorporating AI and machine learning into service assurance won’t happen instantly, however. CSPs will gradually introduce the new capabilities.
CSPs still need to collect, correlate, and visualize network events – performance metrics, faults, and alarms, for example – and fix problems. But now they must do it at massive scale, which requires an additional level of intelligent automation that can be provided only by AI and machine learning.
The first step is providing analytics insights to operations teams to help them understand the data. Then they can verify that the results are accurate and implement responses manually. Over time as the intelligence is validated, it can be built into automated workflows.
Machine learning tools will be key to the automation step. They work by taking information from the billions of performance metrics, alarms, and user actions that telco networks generate daily and observing patterns over time. Based on the continuous learning, closed-loop management can be implemented to automate the processing of new events as they occur.
Network slicing as a use case
Automated assurance becomes especially important as CSPs virtualize their core networks and introduce 5G network slicing, which allows operators to deliver dynamic services that have different requirements for throughput, latency and availability over the same underlying infrastructure.
For example, as CSPs deploy IP/MPLS in their backhaul networks, changes in routers at the infrastructure layer can have unintended consequences on network slices and, as a result, on end-user experiences. If there is a hard network outage, that’s usually easy to understand and route around. But if networks are being disrupted and degraded without an easy-to-identify cause, then the question becomes how to make changes to the network and routing in real-time, and effectively resolve issues while ensuring different types of users, services, and their SLAs are not impacted.
Machine learning tools can help efficiently troubleshoot and resolve these types of issues. They can help CSPs identify network configuration changes that might be risky, recommend which services to shift to a preferred path, and how to implement the change. They can also facilitate faster problem resolution by creating correlations between a problem seen in the network and similar issues that have occurred in the past, and understand how they were resolved.
At Blue Planet, we are helping CSPs work toward this automated assurance nirvana by combining domain orchestration with automated network analytics through our Unified Assurance and Analytics platform. Bringing the two together can help operators build a unified view of what the network looks like – from the application layer to the virtual infrastructure and servers that support them, down to the transport network that interconnects all the devices – and push any changes that are required into the network. The result is unprecedented end-to-end visibility and control.