Welcome to Guangba's HomePage
Welcome to Guangba's HomePage
Home
Publications
Blogs
Projects
Contact
Light
Dark
Automatic
Reliability
MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry
In this paper, we present MARS, a lightweight system for anomaly detection with dynamic threshold and automatic root cause localization in programmable networking systems.
Benran Wang
,
Hongyang Chen
,
Pengfei Chen
,
Zilong He
,
Guangba Yu
Going through the Life Cycle of Faults in Clouds:Guidelines on Fault Handling
When cloud service experience failures, it is typical to conduct a “post-mortem” analysis after its recovery to understand what went wrong, what went right, and how the team could do better in the future. When those failures are public-facing, it is common for some portion of those post-mortem analyses to be made publicly available. The paper describes an analysis of 354 publicly visible post-mortem analyses for three popular three popular large-scale clouds. Based on these findings, the authors have suggested some guidelines on fault handling using chaos engineering, observability, and intelligent operations considerations.
Xiaoyun Li
,
Guangba Yu
,
Pengfei Chen
,
Hongyang Chen
,
Zhekang Chen
Graph based Incident Extraction and Diagnosis in Large-Scale Online Systems
This paper proposes a novel system, named GIED, which is a method to automatically analyze the cascading effect of availability issues in online systems. GIED enables the extraction of graph-based issue representations. This representation includes both the issue symptoms and affected service attributes. A neural network is used to perform incident detection. Finally, the PageRank algorithm is used to locate the root cause of the incident.
Zilong He
,
Pengfei Chen
,
Yu Luo
,
Qiuyu Yan
,
Hongyang Chen
,
Guangba Yu
,
Fangyuan Li
SwissLog: Robust Anomaly Detection andLocalization for Interleaved Unstructured Logs
In this paper, we propose SwissLog, namely a robust and unified deep learning based anomaly detection model for detecting diverse faults based on logs.
Xiaoyun Li
,
Pengfei Chen
,
Linxiao Jing
,
Zilong He
,
Guangba Yu
TraceRank: Abnormal Service Localization with Dis-Aggregated End-to-End Tracing Data in Cloud Native Systems
This paper proposes a novel system named TraceRank to identify and locate abnormal services causing performance problems with dis-aggregated end-to-end traces.
Guangba Yu
,
Zicheng Huang
,
Pengfei Chen
T-Rank:A Lightweight Spectrum based Fault Localization Approach for Microservice Systems
This paper proposes a novel system, named T-Rank, which analyzes clues provided by normal and abnormal traces to locate root causes of latency issues.
Zihao Ye
,
Pengfei Chen
,
Guangba Yu
MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments
This paper proposes a novel system, named MicroRank, which analyzes clues provided by normal and abnormal traces to locate root causes of latency issues.
Guangba Yu
,
Pengfei Chen
,
Hongyang Chen
,
Zijie Guan
,
Zicheng Huang
,
Linxiao Jing
,
Tianjun Weng
,
Xinmeng Sun
,
Xiaoyun Li
SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults
In this paper, we propose SwissLog, namely a robust and unified deep learning based anomaly detection model for detecting diverse faults based on logs.
Xiaoyun Li
,
Pengfei Chen
,
Linxiao Jing
,
Zilong He
,
Guangba Yu
A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems
In this article, we propose TopoMAD, a stochastic seq2seq model which can robustly model spatial and temporal dependence among contaminated data.
Zilong He
,
Pengfei Chen
,
Xiaoyun Li
,
Yongfeng Wang
,
Guangba Yu
,
Cailin Chen
,
Xinrui Li
,
Zibin Zheng
Cite
×