With the rapid growth of microservice systems in cloud-native environments, end-to-end traces have become essential data to help diagnose performance issues. However, existing trace-based anomalydetection and root cause analysis (RCA) still suffer from practical issues due to either the massive volume or frequent system changes. In this study, we propose a lightweight and adaptive trace-based anomaly detection and RCA approach, named MicroSketch, which leverages Sketch based features and Robust Random Cut Forest (RRCForest) to rendertrace analysis more effective and efficient. In addition,MicroSketchis an unsupervised approach that is able to adapt to changes in microservicesystems without any human intervention. We evaluated MicroSketch on a widely-used open-source system and a production system. The results demonstrate the efficiency and effectiveness of MicroSketch. MicroSketch significantly outperforms start-of-the-art approaches, with an average of 40.9% improvement in F1 score on anomaly detection and 25.0% improvement in Recall of Top-1 on RCA. In particular, MicroSketch is at least 60x faster than other methods in terms of diagnosis time.
The blow figure shows the framework of MicroSketch.