MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry

Abstract

In this paper, we present MARS, a lightweight system for anomaly detection with dynamic threshold and automatic root cause localization in programmable networking systems. MARS collects aggregated packet level telemetry on demand and generates a ranked list of fine-grained fault culprits at multiple levels, including port level, switch level, and flow level. Experimental evaluations show the cost-effectiveness of MARS, both in terms of network bandwidth and switch memory usage. Moreover, MARS achieves a 0.97 F1 score in anomaly detection, and 0.95 Recall at Top2 and an overall 0.3 Exam Score in root cause localization.

Publication
In 32nd International Conference on Parallel Processing (CCF B)

The blow figure shows the framework of GIED.

MARS Framework

Guangba Yu
Guangba Yu
Ph.D. Candidate Focus on Cloud Native

My research interests include cloud computing, microservices, Serverless, AIOps