Welcome to Guangba's HomePage
Welcome to Guangba's HomePage
Home
Publications
Blogs
Projects
Contact
Light
Dark
Automatic
Microservice
ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
To address the limitations of ACD, we propose a novel concept called root cause change analysis (RCCA) to identify the underlying root causes of change inducing incidents. In order to apply the RCCA concept to practical scenarios, we have devised an intelligent RCCA framework named ChangeRCA. This framework aims to localize the defective change associated with change-inducing incidents among multiple changes.
Guangba Yu
,
Pengfei Chen
,
Zilong He
,
Qiuyu Yan
,
Yu Luo
,
Fangyan Li
,
Zibin Zheng
TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State
In this study, we introduce TraStrainer, an online sampler that takes into account both system runtime state and trace diversity. TraStrainer employs an interpretable and automated encoding method to represent traces as vectors. Simultaneously, it adaptively determines sampling preferences by analyzing system runtime metrics. When sampling, it combines the results of system-bias and diversity-bias through a dynamic voting mechanism.
Haiyu Huang
,
Xiaoyu Zhang
,
Pengfei Chen
,
Zilong He
,
Zhiming Chen
,
Guangba Yu
,
Hongyang Chen
,
Chen Sun
Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data
In this study, we present Nezha, an interpretable and fine-grained RCA approach that pinpoints root causes at the code region and resource type level by incorporative analysis of multi-modal data. Nezha transforms heterogeneous multi-modal data into a homogeneous event representation and extracts event patterns by constructing and mining event graphs. The core idea of Nezha is to compare event patterns in the fault-free phase with those in the fault-suffering phase to localize root causes in an interpretable way.
Guangba Yu
,
Pengfei Chen
,
Yufeng Li
,
Hongyang Chen
,
Xiaoyun Li
,
Zibin Zheng
Graph based Incident Extraction and Diagnosis in Large-Scale Online Systems
This paper proposes a novel system, named GIED, which is a method to automatically analyze the cascading effect of availability issues in online systems. GIED enables the extraction of graph-based issue representations. This representation includes both the issue symptoms and affected service attributes. A neural network is used to perform incident detection. Finally, the PageRank algorithm is used to locate the root cause of the incident.
Zilong He
,
Pengfei Chen
,
Yu Luo
,
Qiuyu Yan
,
Hongyang Chen
,
Guangba Yu
,
Fangyuan Li
TS-InvarNet: Anomaly Detection and Localization based on Tempo-spatial KPI Invariants in Distributed Services
In this paper, we design and implement TS-InvarNet, an interpretable end-to-end anomaly detection and diagnosis framework based on tempo-spatial KPI invariants.
Zijun Hu
,
Pengfei Chen
,
Guangba Yu
,
Zilong He
,
Xiaoyun Li
TraceRank: Abnormal Service Localization with Dis-Aggregated End-to-End Tracing Data in Cloud Native Systems
This paper proposes a novel system named TraceRank to identify and locate abnormal services causing performance problems with dis-aggregated end-to-end traces.
Guangba Yu
,
Zicheng Huang
,
Pengfei Chen
Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems
In this paper, we design and implement Sieve, an online sampler that aims to bias sampling towards uncommon traces by taking advantage of the attention mechanism.
Zicheng Huang
,
Pengfei Chen
,
Guangba Yu
,
Hongyang Chen
,
Zibin Zheng
T-Rank:A Lightweight Spectrum based Fault Localization Approach for Microservice Systems
This paper proposes a novel system, named T-Rank, which analyzes clues provided by normal and abnormal traces to locate root causes of latency issues.
Zihao Ye
,
Pengfei Chen
,
Guangba Yu
Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF
This paper proposes a novel system, named Kmon, which is an In-kernel transparent monitoring system for microservice systems with extended Berkeley Packet Filter (eBPF).
Tianjun Weng
,
Wanqi Yang
,
Guangba Yu
,
Pengfei Chen
,
Jieqi Cui
,
Chuangfu Zhang
MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments
This paper proposes a novel system, named MicroRank, which analyzes clues provided by normal and abnormal traces to locate root causes of latency issues.
Guangba Yu
,
Pengfei Chen
,
Hongyang Chen
,
Zijie Guan
,
Zicheng Huang
,
Linxiao Jing
,
Tianjun Weng
,
Xinmeng Sun
,
Xiaoyun Li
»
Cite
×