Guangba Yu

The Chinese University of Hong Kong Postdoctoral Researcher, CUHK

I am a postdoctoral researcher at ARISE, The Chinese University of Hong Kong, mentored by Prof. Michael R. Lyu. My journey in tech was forged at Sun Yat-sen University, where I earned my Ph.D. and M.Eng under the guidance of Prof. Pengfei Chen. I've also had the privilege of working in industry as a Ph.D. researcher at WeChat, mentored by Yuetang Deng.

But my path began not with code, but with traditional Chinese medicine. I started my academic career at Guangzhou University of Chinese Medicine—often called the "MIT of Traditional Chinese Medicine"—with the dream of becoming a physician who diagnosed the "bugs" and imbalances of the human body. Today, I've channeled that diagnostic spirit from human systems to software systems, dedicating myself to identifying and "curing" the ailments that affect software reliability and keeping our digital world healthy.

😃 Passionate about prescribing solutions for system reliability. If you're interested in building more robust software, let's connect!


💡 Research Interest
  • Reliability in Cloud and AI System
    • Performance Diagnose
    • Performance Optimization
    • Chaos Engineering
    • Observability
📖 Education
  • Sun Yat-sen University

    Sun Yat-sen University

    Ph.D. in Computer Science and Technology 2020 - 2024

  • Sun Yat-sen University

    Sun Yat-sen University

    M.Eng in Computer Technology 2018 - 2020

🎖 Honors & Awards
  • ACM China(Zhuhai) Doctoroal Dissertation Award 2024
  • Distinguish Paper Award for FSE'24 2024
  • Outstanding PhD Graduate, SYSU 2024
  • National Scholarship, SYSU 2022-2023
  • Tencent Rhino-Bird Research Elite Program & Special Scholarship, Tencent 2022-2023
💼 Service
  • PC: APSEC'25, Eurosys'25(Shadow), OSDI'22(Artifact), ATC'22(Artifact)
  • Reviewer: WWW'24, TPDS, TC, TSE, TOSEM, TMC, TSC, TCC, J Supercomput
  • Sub-reviewer: TOSEM, ICSOC'22, ISSTA'25
News
2025
🎉 Our Alert Management and Cloud Ops RAG Paper has been accepted by ASE'25.
14 Aug
🎉 Our Network Telemetry RCA Paper has been accepted by ToN. Congras to Hongyang!
18 Jul
🎉 Our Cloud Incident Management Paper has been accepted by TSE. Congras to Zilong!
14 Jul
🎉 Our Survey on Failure Analysis and Fault Injection in AI Systems has been accepted by TOSEM.
26 Mar
🎉 Log-based LLM training failures diagnosis framework L4 has been accepted by FSE'25. Congras to Zhihan!
26 Mar
🎉 Code knowledge enhanced root cause analysis framework CoCA has been accepted by ICSE'25. Congras to Yichen!
20 Jan
2024
🎉 Microservice Trace Framework Mini has been accepted by ASPLOS'25. Congras to Haiyu!
30 Oct
🎉 FaaS configuration optimization Paper FaaSConf has been accepted by ASE'24 (118/587). Congras to Yilun!
07 Aug
🏆 Trace sampling paper TraStrainer has won Best Paper Award at FSE'24.
18 Jul
🎉 Request-level fault injection paper MicroFI has been accepted by TDSC. Congras to Hongyang!
23 Feb
🎉 Root cause change analysis paper ChangeRCA and trace sampling paper TraStrainer have been accepted by FSE'24 (56/483).
22 Jan
2023
🎉 Multi-modal observability data RCA paper Nezha and database configuration optimization paper Diagconfig has been accepted by FSE'23.
27 Jul
Selected Publications († represents corresponding author) (view all )
iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation

Junjie Huang, Yuedong Zhong, Guangba Yu, Zihan Jiang, Minzhi Yan, Wenfei Luan, Tianyu Yang, Rui Ren, Michael Lyu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation
iKnow: an Intent-Guided Chatbot for Cloud Operations with Retrieval-Augmented Generation

Junjie Huang, Yuedong Zhong, Guangba Yu, Zihan Jiang, Minzhi Yan, Wenfei Luan, Tianyu Yang, Rui Ren, Michael Lyu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems

Guangba Yu, Genting Mai, Rui Wang, Ruipeng Li, Pengfei Chen, Long Pan, Ruijie Xu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems
AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems

Guangba Yu, Genting Mai, Rui Wang, Ruipeng Li, Pengfei Chen, Long Pan, Ruijie Xu

ASE'25 (CCF A) In 40th IEEE/ACM International Conference on Automated Software Engineering

NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection

Hongyang Chen, Benran Wang, Guangba Yu, Zilong He, Pengfei Chen, Chen Sun, Zibin Zheng

ToN (CCF A) In IEEE Transactions on Networking

NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection
NetScope: Fault Localization in Programmable Networking Systems With Low-Cost In-Band Network Telemetry and In-Network Detection

Hongyang Chen, Benran Wang, Guangba Yu, Zilong He, Pengfei Chen, Chen Sun, Zibin Zheng

ToN (CCF A) In IEEE Transactions on Networking

Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework

Zilong He, Pengfei Chen, Yu Luo, Qiuyu Yan, Hongyang Chen, Guangba Yu, Fangyuan Lo, Xiaoyun Li, Zibin Zheng

TSE (CCF A) In IEEE Transactions Software Engineering

Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework
Subgraphs as First-Class Citizens in Incident Management for Large-Scale Online Systems An Evolution-AwareFramework

Zilong He, Pengfei Chen, Yu Luo, Qiuyu Yan, Hongyang Chen, Guangba Yu, Fangyuan Lo, Xiaoyun Li, Zibin Zheng

TSE (CCF A) In IEEE Transactions Software Engineering

A Survey on Failure Analysis and Fault Injection in AI Systems

Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng, Michael R. Lyu

TOSEM (CCF A) In ACM Transactions on Software Engineering and Methodology

A Survey on Failure Analysis and Fault Injection in AI Systems
A Survey on Failure Analysis and Fault Injection in AI Systems

Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng, Michael R. Lyu

TOSEM (CCF A) In ACM Transactions on Software Engineering and Methodology

L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis

Zhihan Jiang, Junjie Huang, Guangba Yu, Zhuangbin Chen, Yichen Li, Renyi Zhong, Cong Feng, Yongqiang Yang, Michael R. Lyu

FSE'25 (CCF A) In 33nd ACM International Conference on the Foundations of Software Engineering.

L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis
L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis

Zhihan Jiang, Junjie Huang, Guangba Yu, Zhuangbin Chen, Yichen Li, Renyi Zhong, Cong Feng, Yongqiang Yang, Michael R. Lyu

FSE'25 (CCF A) In 33nd ACM International Conference on the Foundations of Software Engineering.

COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge

Yichen Li, Yulun Wu, Jinyang Liu, Zhihan Jiang, Zhuangbin Chen, Guangba Yu, Michael R. Lyu

ICSE'25 (CCF A) In 47th IEEE/ACM International Conference on Software Engineering

COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge
COCA: Generative Root Cause Analysis for Distributed Systems with Code Knowledge

Yichen Li, Yulun Wu, Jinyang Liu, Zhihan Jiang, Zhuangbin Chen, Guangba Yu, Michael R. Lyu

ICSE'25 (CCF A) In 47th IEEE/ACM International Conference on Software Engineering

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss

Haiyu Huang, Cheng Chen, Kunyi Chen, Pengfei Chen, Guangba Yu, Zilong He, Yilun Wang, Huxing Zhang, Qi Zhou

ASPLOS'25 (CCF A) In 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss
Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysiss

Haiyu Huang, Cheng Chen, Kunyi Chen, Pengfei Chen, Guangba Yu, Zilong He, Yilun Wang, Huxing Zhang, Qi Zhou

ASPLOS'25 (CCF A) In 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.

FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows

Yilun Wang, Pengfei Chen, Hui Dou, Yiwen Zhang, Guangba Yu, Zilong He, Haiyu Huang

ASE'24 (CCF A) In 39th IEEE/ACM International Conference on Automated Software Engineering.

FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows
FaaSConf: QoS-aware Hybrid Resources Configuration for Serverless Workflows

Yilun Wang, Pengfei Chen, Hui Dou, Yiwen Zhang, Guangba Yu, Zilong He, Haiyu Huang

ASE'24 (CCF A) In 39th IEEE/ACM International Conference on Automated Software Engineering.

MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications

Hongyang Chen, Pengfei Chen, Guangba Yu, Xiaoyun Li, Zilong He, Huxing Zhang

TDSC (CCF A) In IEEE Transactions on Dependable and Secure Computing

MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications
MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications

Hongyang Chen, Pengfei Chen, Guangba Yu, Xiaoyun Li, Zilong He, Huxing Zhang

TDSC (CCF A) In IEEE Transactions on Dependable and Secure Computing

TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State

Haiyu Huang, Xiaoyu Zhang, Pengfei Chen, Zilong He, Zhiming Chen, Guangba Yu, Hongyang Chen, Chen Sun

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering. 🏆 Distinguish Paper Award

TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State
TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State

Haiyu Huang, Xiaoyu Zhang, Pengfei Chen, Zilong He, Zhiming Chen, Guangba Yu, Hongyang Chen, Chen Sun

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering. 🏆 Distinguish Paper Award

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

Guangba Yu, Pengfei Chen, Zilong He, Qiuyu Yan, Yu Luo, Fangyan Li, Zibin Zheng

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering.

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

Guangba Yu, Pengfei Chen, Zilong He, Qiuyu Yan, Yu Luo, Fangyan Li, Zibin Zheng

FSE'24 (CCF A) In 32nd ACM International Conference on the Foundations of Software Engineering.

Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data

Guangba Yu, Pengfei Chen, Yufeng Li, Hongyang Chen, Xiaoyun Li, Zibin Zheng

FSE'23 (CCF A) In 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data
Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data

Guangba Yu, Pengfei Chen, Yufeng Li, Hongyang Chen, Xiaoyun Li, Zibin Zheng

FSE'23 (CCF A) In 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly

Guangba Yu, Pengfei Chen, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, Zibin Zheng

ICSE'23 (CCF A) In 45th IEEE/ACM International Conference on Software Engineering

LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly
LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly

Guangba Yu, Pengfei Chen, Pairui Li, Tianjun Weng, Haibing Zheng, Yuetang Deng, Zibin Zheng

ICSE'23 (CCF A) In 45th IEEE/ACM International Conference on Software Engineering

FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum

Guangba Yu, Pengfei Chen, Zibin Zheng, Jingrun Zhang, Xiaoyun Li, Zilong He

TSC (CCF A) In IEEE Transaction on Service Computing

FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum
FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum

Guangba Yu, Pengfei Chen, Zibin Zheng, Jingrun Zhang, Xiaoyun Li, Zilong He

TSC (CCF A) In IEEE Transaction on Service Computing

All publications