Guangba Yu

Postdoc Focus on Reliablity of Distributed Systems

DDS Lab, Sun Yat-Sen University


I am Guangba Yu (余广坝 in Chinese). I will be a postdoctoral researcher in the Automated Reliable Intelligent Software Engineering (ARISE) Lab at The Chinese University of Hong Kong, mentored by Prof. Michael R. Lyu.

I received Ph.D. and M.Eng at Sun Yat-Sen University, advised by Professor Pengfei Chen. I am interested in cloud native, microservice, AIOps and MLOps. My research focus on perfromance diagnose and optimization in distributed systems. And I have a strong curiosity about telemetry of cloud systems.

I have awarded Tencent Rhino-Bird Research Elite Program and Tencent Special Scholarship in 2022. I am a Ph.D. software engineering student researcher at WeChat in 2022, hosted by Yuetang Deng.

I maintain a Github project about Awesome cloud paper and a WeChat public account WeeklyCloudPaper in Chinese. Welcome to follow my updates.

  • Cloud Computing
  • Microservice
  • MLOps
  • AIOps
  • Chaos Engineering
  • Ph.D. in Computer Science and Technology, 2024

    Sun Yat-Sen University

  • M.Eng in Computer Technology, 2020

    Sun Yat-Sen University

Recent News

[26/06/24] Trace Sampling Framework TraStrainer is awarded Distinguish Paper Award at FSE 2024.

[23/02/24] Request-level Fault Injection Framework MicroFI is accepted by TDSC. Congratulations to Hongyang Chen!

[23/01/24] Root Cause Change Analysis Framework ChangeRCA and Trace Sampling Framework TraStrainer are accepted by FSE 2024.

[28/07/23] Multi-modal Observability Data RCA Framework Nezha and Configuration Optimization Framework Diagconfig are accepted by FSE 2023.

[16/06/23] Automatic Power Management Framwork DeepPower and Automatic Network Root Cause Analysis Framework MARS are accepted by ICPP 2023.

Recent Publications

(2024). A Survey on Failure Analysis and Fault Injection in AI Systems. In Submition.

PDF Cite Dataset

(2024). CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning. In Internetware'24 (CCF C).

Cite Code

(2024). MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications. In TDSC (CCF A).

PDF Cite

(2024). ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems. In FSE'24 (CCF A).

PDF Cite Code Dataset

(2024). TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State. In FSE'24 (CCF A).

PDF Cite Code Dataset

(2023). Network shortcut in data plane of service mesh with eBPF. In JNCA (CCF C).

PDF Cite

(2023). Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data. In FSE'23 (CCF A).

PDF Cite Code Dataset DOI

(2023). DiagConfig: Configuration Diagnosis of Performance Violations in Configurable Software Systems. In FSE'23 (CCF A).

PDF Cite Code

(2023). DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems. In ICPP'23 (CCF B).

PDF Cite

(2023). MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry. In ICPP'23 (CCF B).

PDF Cite

(2023). FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum. In TSC (CCF A).

PDF Cite Code

(2022). LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly. In ICSE'23 (CCF A).

PDF Cite Code Slides

(2022). Going through the Life Cycle of Faults in Clouds:Guidelines on Fault Handling. In ISSRE'22 (CCF B).

PDF Cite Code

(2022). Graph based Incident Extraction and Diagnosis in Large-Scale Online Systems. In ASE'22 (CCF A).

PDF Cite Code

(2022). TS-InvarNet: Anomaly Detection and Localization based on Tempo-spatial KPI Invariants in Distributed Services. In ICWS'22 (CCF B).

PDF Cite

(2022). SwissLog: Robust Anomaly Detection andLocalization for Interleaved Unstructured Logs. In TDSC (CCF A).

PDF Cite Code

(2021). Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems. In ICWS'21 (CCF B).

PDF Cite Code

(2021). T-Rank:A Lightweight Spectrum based Fault Localization Approach for Microservice Systems. In CCGrid'21 (CCF C, CORE A).

PDF Cite

(2021). Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF. In CloudIntelligence'21.

PDF Cite Code

(2021). MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments. In WWW'21 (CCF A).

PDF Cite Code Video

(2020). A Learning-based Dynamic Load Balancing Approach for Microservice Systems in Multi-cloud Environment. In ICPADS'20 (CCF C, CORE B).

PDF Cite

(2020). SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In ISSRE'20 (CCF B).

PDF Cite Code

(2020). A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems. In TNNLS (Impact Factor 10.451, CCF B).

PDF Cite Dataset

(2020). A Framework of Virtual War Room and Matrix Sketch-Based Streaming Anomaly Detection for Microservice Systems. In Access (Impact Factor 3.367).

PDF Cite

(2019). Microscaler: Automatic Scaling for Microservices with an Online Learning Approach. In ICWS'19 (CCF B).

PDF Cite

Recent Blogs

