In this paper, we present MARS, a lightweight system for anomaly detection with dynamic threshold and automatic root cause localization in programmable networking systems. MARS collects aggregated packet level telemetry on demand and generates a ranked list of fine-grained fault culprits at multiple levels, including port level, switch level, and flow level. Experimental evaluations show the cost-effectiveness of MARS, both in terms of network bandwidth and switch memory usage. Moreover, MARS achieves a 0.97 F1 score in anomaly detection, and 0.95 Recall at Top2 and an overall 0.3 Exam Score in root cause localization.
The blow figure shows the framework of GIED.