Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis

Abstract

Federated learning (FL) has emerged as a popular paradigm for distributed machine learning over decentralized data. Data generated by FL clients is prone to noises. While the impact of data noise on centralized learning (CL) is well understood, there is lack of a systematic study for FL. We fill this gap by presenting an empirical investigation to provide a deeper understanding regarding the impact of data noise on FL. Our study is enabled by NoiseMaker, an open-source and extensible toolkit for the injection of controlled data noises across five diverse data modalities. Our experimental evaluation results reveal that FL is significantly more vulnerable to data noise compared to CL.

Publication
To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026)
Avatar
Jinming Hu
Founder and Chief Scientist

My research interests include machine learning, data mining, deep learning, computer vision, operating system, and database.

Related