Guangba Yu

The Chinese University of Hong Kong Postdoctoral Researcher, CUHK

I am a postdoctoral researcher at ARISE, The Chinese University of Hong Kong, mentored by Prof. Michael R. Lyu. My journey in tech was forged at Sun Yat-sen University, where I earned my Ph.D. and M.Eng under the guidance of Prof. Pengfei Chen. I've also had the privilege of working in industry as a Ph.D. researcher at WeChat..

My path began not with code, but with traditional Chinese medicine. I started my academic career at Guangzhou University of Chinese Medicine, often called the "MIT of Traditional Chinese Medicine", with the dream of becoming a physician who diagnosed the "bugs" of human body. Today, I've channeled that diagnostic spirit from human systems to software systems, dedicating myself to identifying and healing the faults that affect software reliability and keeping our digital world healthy.

πŸ˜ƒ Passionate about prescribing solutions for system reliability. If you're interested in building more robust software, let's connect!


πŸ’‘ Research Interest
  • Reliability in Cloud and AI System
    • Performance Diagnose
    • Performance Optimization
    • Chaos Engineering
    • Observability
πŸ“– Education
  • Sun Yat-sen University

    Sun Yat-sen University

    Ph.D. in Computer Science and Technology 2020 - 2024

  • Sun Yat-sen University

    Sun Yat-sen University

    M.Eng in Computer Technology 2018 - 2020

πŸŽ– Honors & Awards
  • ACM China(Zhuhai) Doctoroal Dissertation Award 2024
  • Distinguish Paper Award for FSE'24 2024
  • Outstanding PhD Graduate, SYSU 2024
  • National Scholarship, SYSU 2022-2023
  • Tencent Rhino-Bird Research Elite Program & Special Scholarship, Tencent 2022-2023
πŸ’Ό Service
  • PC: APSEC'25, Eurosys'25(Shadow), OSDI'22(Artifact), ATC'22(Artifact)
  • Reviewer: WWW'24, TPDS, TC, TSE, TOSEM, TMC, TSC, TCC, J Supercomput
  • Sub-reviewer: TOSEM, ICSOC'22, ISSTA'25
News
2025
πŸŽ‰ Our Alert Management and Cloud Ops RAG Paper has been accepted by ASE'25.
14 Aug
πŸŽ‰ Our Network Telemetry RCA Paper has been accepted by ToN. Congras to Hongyang!
18 Jul
πŸŽ‰ Our Cloud Incident Management Paper has been accepted by TSE. Congras to Zilong!
14 Jul
πŸŽ‰ Our Survey on Failure Analysis and Fault Injection in AI Systems has been accepted by TOSEM.
26 Mar
πŸŽ‰ Log-based LLM training failures diagnosis framework L4 has been accepted by FSE'25. Congras to Zhihan!
26 Mar
πŸŽ‰ Code knowledge enhanced root cause analysis framework CoCA has been accepted by ICSE'25. Congras to Yichen!
20 Jan
2024
πŸŽ‰ Microservice Trace Framework Mini has been accepted by ASPLOS'25. Congras to Haiyu!
30 Oct
πŸŽ‰ FaaS configuration optimization Paper FaaSConf has been accepted by ASE'24 (118/587). Congras to Yilun!
07 Aug
πŸ† Trace sampling paper TraStrainer has won Best Paper Award at FSE'24.
18 Jul
πŸŽ‰ Request-level fault injection paper MicroFI has been accepted by TDSC. Congras to Hongyang!
23 Feb
πŸŽ‰ Root cause change analysis paper ChangeRCA and trace sampling paper TraStrainer have been accepted by FSE'24 (56/483).
22 Jan
2023
πŸŽ‰ Multi-modal observability data RCA paper Nezha and database configuration optimization paper Diagconfig has been accepted by FSE'23.
27 Jul
Selected Publications († represents corresponding author) (view all )
iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation

Junjie Huang, Yuedong Zhong, Guangba Yu†, Zihan Jiang, Minzhi Yan, Wenfei Luan, Tianyu Yang, Rui Ren, Michael Lyu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation
iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation

Junjie Huang, Yuedong Zhong, Guangba Yu†, Zihan Jiang, Minzhi Yan, Wenfei Luan, Tianyu Yang, Rui Ren, Michael Lyu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems

Guangba Yu, Genting Mai, Rui Wang, Ruipeng Li, Pengfei Chen†, Long Pan, Ruijie Xu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems
AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems

Guangba Yu, Genting Mai, Rui Wang, Ruipeng Li, Pengfei Chen†, Long Pan, Ruijie Xu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection

Hongyang Chen, Benran Wang, Guangba Yu, Zilong He, Pengfei Chen†, Chen Sun, Zibin Zheng

ToN (CCF A) In IEEE Transactions on Networking

NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection
NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection

Hongyang Chen, Benran Wang, Guangba Yu, Zilong He, Pengfei Chen†, Chen Sun, Zibin Zheng

ToN (CCF A) In IEEE Transactions on Networking

Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework

Zilong He, Pengfei Chen†, Yu Luo, Qiuyu Yan, Hongyang Chen, Guangba Yu, Fangyuan Lo, Xiaoyun Li, Zibin Zheng

TSE (CCF A) In IEEE Transactions Software Engineering

Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework
Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework

Zilong He, Pengfei Chen†, Yu Luo, Qiuyu Yan, Hongyang Chen, Guangba Yu, Fangyuan Lo, Xiaoyun Li, Zibin Zheng

TSE (CCF A) In IEEE Transactions Software Engineering

A Survey on Failure Analysis and Fault Injection in AI Systems

Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen†, Roberto Natella, Zibin Zheng, Michael R. Lyu

TOSEM (CCF A) In ACM Transactions on Software Engineering and Methodology

A Survey on Failure Analysis and Fault Injection in AI Systems
A Survey on Failure Analysis and Fault Injection in AI Systems

Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen†, Roberto Natella, Zibin Zheng, Michael R. Lyu

TOSEM (CCF A) In ACM Transactions on Software Engineering and Methodology

L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis

Zhihan Jiang, Junjie Huang, Guangba Yu†, Zhuangbin Chen, Yichen Li, Renyi Zhong, Cong Feng, Yongqiang Yang, Michael R. Lyu

FSE'25 (CCF A) In 33nd ACM International Conference on the Foundations of Software Engineering.

L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis
L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis

Zhihan Jiang, Junjie Huang, Guangba Yu†, Zhuangbin Chen, Yichen Li, Renyi Zhong, Cong Feng, Yongqiang Yang, Michael R. Lyu

FSE'25 (CCF A) In 33nd ACM International Conference on the Foundations of Software Engineering.

COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge

Yichen Li, Yulun Wu, Jinyang Liu, Zhihan Jiang, Zhuangbin Chen, Guangba Yu†, Michael R. Lyu

ICSE'25 (CCF A) In 47th IEEE/ACM International Conference on Software Engineering

COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge
COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge

Yichen Li, Yulun Wu, Jinyang Liu, Zhihan Jiang, Zhuangbin Chen, Guangba Yu†, Michael R. Lyu

ICSE'25 (CCF A) In 47th IEEE/ACM International Conference on Software Engineering

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss

Haiyu Huang, Cheng Chen, Kunyi Chen, Pengfei Chen†, Guangba Yu, Zilong He, Yilun Wang, Huxing Zhang, Qi Zhou

ASPLOS'25 (CCF A) In 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss
Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss

Haiyu Huang, Cheng Chen, Kunyi Chen, Pengfei Chen†, Guangba Yu, Zilong He, Yilun Wang, Huxing Zhang, Qi Zhou

ASPLOS'25 (CCF A) In 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows

Yilun Wang, Pengfei Chen, Hui Dou†, Yiwen Zhang, Guangba Yu, Zilong He, Haiyu Huang

ASE'24 (CCF A) In 39th IEEE/ACM International Conference on Automated Software Engineering.

FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows
FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows

Yilun Wang, Pengfei Chen, Hui Dou†, Yiwen Zhang, Guangba Yu, Zilong He, Haiyu Huang

ASE'24 (CCF A) In 39th IEEE/ACM International Conference on Automated Software Engineering.

MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications

Hongyang Chen, Pengfei Chen†, Guangba Yu, Xiaoyun Li, Zilong He, Huxing Zhang

TDSC (CCF A) In IEEE Transactions on Dependable and Secure Computing

MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications
MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications

Hongyang Chen, Pengfei Chen†, Guangba Yu, Xiaoyun Li, Zilong He, Huxing Zhang

TDSC (CCF A) In IEEE Transactions on Dependable and Secure Computing

TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State

Haiyu Huang, Xiaoyu Zhang, Pengfei Chen†, Zilong He, Zhiming Chen, Guangba Yu, Hongyang Chen, Chen Sun

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering. πŸ† Distinguish Paper Award

TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State
TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State

Haiyu Huang, Xiaoyu Zhang, Pengfei Chen†, Zilong He, Zhiming Chen, Guangba Yu, Hongyang Chen, Chen Sun

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering. πŸ† Distinguish Paper Award

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

Guangba Yu, Pengfei Chen†, Zilong He, Qiuyu Yan, Yu Luo, Fangyan Li, Zibin Zheng

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering.

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

Guangba Yu, Pengfei Chen†, Zilong He, Qiuyu Yan, Yu Luo, Fangyan Li, Zibin Zheng

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering.

Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data

Guangba Yu, Pengfei Chen†, Yufeng Li, Hongyang Chen, Xiaoyun Li, Zibin Zheng

FSE'23 (CCF A) In 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data
Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data

Guangba Yu, Pengfei Chen†, Yufeng Li, Hongyang Chen, Xiaoyun Li, Zibin Zheng

FSE'23 (CCF A) In 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly

Guangba Yu, Pengfei Chen†, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, Zibin Zheng

ICSE'23 (CCF A) In 45th IEEE/ACM International Conference on Software Engineering

LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly
LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly

Guangba Yu, Pengfei Chen†, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, Zibin Zheng

ICSE'23 (CCF A) In 45th IEEE/ACM International Conference on Software Engineering

FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum

Guangba Yu, Pengfei Chen†, Zibin Zheng, Jingrun Zhang, Xiaoyun Li, Zilong He

TSC (CCF A) In IEEE Transaction on Service Computing

FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum
FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum

Guangba Yu, Pengfei Chen†, Zibin Zheng, Jingrun Zhang, Xiaoyun Li, Zilong He

TSC (CCF A) In IEEE Transaction on Service Computing

All publications