ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

Abstract

In large-scale online service systems, the occurrence of software changes is inevitable and frequent. Despite rigorous pre-deployment testing practices, the presence of defective software changes in the online environment cannot be completely eliminated. Consequently, there is a pressing need for automated techniques that can effectively identify these defective changes However, the current abnormal change detection (ACD) approaches fall short in accurately pinpointing defective changes, primarily due to their disregard for the propagation of faults. To address the limitations of ACD, we propose a novel concept called root cause change analysis (RCCA) to identify the underlying root causes of change-inducing incidents. In order to apply the RCCA concept to practical scenarios, we have devised an intelligent RCCA framework named ChangeRCA. This framework aims to localize the defective change associated with change-inducing incidents among multiple changes.

Publication
In 32nd ACM International Conference on the Foundations of Software Engineering

The blow figure shows the framework of ChangeRCA.

ChangeRCA Framework

Guangba Yu
Guangba Yu
Ph.D. Candidate Focus on Cloud Native

My research interests include cloud computing, microservices, Serverless, AIOps