Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis

Jinming Hu, Jiahao Gu, Kenta Ploch, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang

February 2026

Abstract

Federated learning (FL) has emerged as a popular paradigm for distributed machine learning over decentralized data. Data generated by FL clients is prone to noises. While the impact of data noise on centralized learning (CL) is well understood, there is lack of a systematic study for FL. We fill this gap by presenting an empirical investigation to provide a deeper understanding regarding the impact of data noise on FL. Our study is enabled by NoiseMaker, an open-source and extensible toolkit for the injection of controlled data noises across five diverse data modalities. Our experimental evaluation results reveal that FL is significantly more vulnerable to data noise compared to CL.

Type

Conference paper

Publication

To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026)

Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis

Abstract

Jinming Hu

Founder and Chief Scientist

Related