
(2024). A Survey on Failure Analysis and Fault Injection in AI Systems. In Submition.

PDF Cite Dataset

(2024). CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning. In Internetware'24 (CCF C).

Cite Code

(2024). MicroFI: Non-Intrusive and Prioritized Request-Level Fault Injection for Microservice Applications. In TDSC (CCF A).

PDF Cite

(2024). ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems. In FSE'24 (CCF A).

PDF Cite Code Dataset

(2024). TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State. In FSE'24 (CCF A).

PDF Cite Code Dataset

(2023). Network shortcut in data plane of service mesh with eBPF. In JNCA (CCF C).

PDF Cite

(2023). Nezha: Interpretable Fine-Grained Root Causes Analysis for Microservices on Multi-Modal Observability Data. In FSE'23 (CCF A).

PDF Cite Code Dataset DOI

(2023). DiagConfig: Configuration Diagnosis of Performance Violations in Configurable Software Systems. In FSE'23 (CCF A).

PDF Cite Code

(2023). MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry. In ICPP'23 (CCF B).

PDF Cite

(2023). DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems. In ICPP'23 (CCF B).

PDF Cite

(2023). FaaSDeliver: Cost-efficient and QoS-aware Function Delivery in Computing Continuum. In TSC (CCF A).

PDF Cite Code

(2022). LogReducer: Identify and Reduce Log Hotspots in Kernel on the Fly. In ICSE'23 (CCF A).

PDF Cite Code Slides

(2022). Graph based Incident Extraction and Diagnosis in Large-Scale Online Systems. In ASE'22 (CCF A).

PDF Cite Code

(2022). Going through the Life Cycle of Faults in Clouds:Guidelines on Fault Handling. In ISSRE'22 (CCF B).

PDF Cite Code

(2022). TS-InvarNet: Anomaly Detection and Localization based on Tempo-spatial KPI Invariants in Distributed Services. In ICWS'22 (CCF B).

PDF Cite

(2022). SwissLog: Robust Anomaly Detection andLocalization for Interleaved Unstructured Logs. In TDSC (CCF A).

PDF Cite Code

(2021). Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems. In ICWS'21 (CCF B).

PDF Cite Code

(2021). T-Rank:A Lightweight Spectrum based Fault Localization Approach for Microservice Systems. In CCGrid'21 (CCF C, CORE A).

PDF Cite

(2021). Kmon: An In-kernel Transparent Monitoring System for Microservice Systems with eBPF. In CloudIntelligence'21.

PDF Cite Code

(2021). MicroRank: End-to-End Latency Issue Localization with Extended Spectrum Analysis in Microservice Environments. In WWW'21 (CCF A).

PDF Cite Code Video

(2020). A Learning-based Dynamic Load Balancing Approach for Microservice Systems in Multi-cloud Environment. In ICPADS'20 (CCF C, CORE B).

PDF Cite

(2020). SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In ISSRE'20 (CCF B).

PDF Cite Code

(2020). A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems. In TNNLS (Impact Factor 10.451, CCF B).

PDF Cite Dataset

(2020). A Framework of Virtual War Room and Matrix Sketch-Based Streaming Anomaly Detection for Microservice Systems. In Access (Impact Factor 3.367).

PDF Cite

(2019). Microscaler: Automatic Scaling for Microservices with an Online Learning Approach. In ICWS'19 (CCF B).

PDF Cite