Search

Jinming Hu, Jiahao Gu, Kenta Ploch, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang (2026). Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis. To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026).

Jinming Hu (2026). The Informational Foundation of Physical Reality: Proving the Necessity of Relativity and Quantics via Information Bandwidth.

Jinming Hu (2025). The Computational Event Horizon: A Heuristic Taxonomy of Solvability for the Millennium Prize Problems.

Xinjing Zhou, Viktor Leis, Jinming Hu, Xiangyao Yu, Michael Stonebraker (2025). Practical db-os co-design with privileged kernel bypass. Proceedings of the ACM on Management of Data (SIGMOD).

PDF Google Scholar

See all publications

Posts

After My PhD Journey: My Conflict with My PhD Advisor Qizhen Zhang

In less than three months, family crisis and conflict with my PhD advisor changed everything, and I chose a new path in AI …

Mar 24, 2026 10 min read Life

Before My PhD Journey

A reflection before departure: five years of work, struggle, and choices, and a final commitment to research and education.

Mar 24, 2026 13 min read Life

Experience

Founder and Chief Scientist

Sea-Land.ai

Dec 2025 – Present

Leading research and development in high-performance systems and machine learning. Build good and helpful AI for everyone.

PhD Student

Department of Computer Science, University of Toronto

Sep 2025 – Dec 2025

Conducting research for AI and DB.

Research Intern

University of Toronto

Jan 2025 – Dec 2025 Toronto

Software Engineer

DolphinDB

Mar 2021 – Present Hangzhou

Fortunate to work with Davis, Xinjing Zhou, and many colleagues.

Designing and building the storage engine for Time-Series Database, which is extremely efficient both for analytics, writing data, and point-query.
Leading System/DB for AI, add textDB(text search in DolphinDB), vectorDB, and etc.
Maintaining and extending the existing computing engine
- Designing and developing new computing functionalities like distributed join.
- Optimizing existing functionalities such as group by, context by, and etc.
- Designing a RBO + CBO.
- Extending the Dlang language.

Invited Lecturer

Zhejiang University

Nov 2020 – Jan 2021 Hangzhou

Responsibilities include:

Designing and teaching the core course “Machine Learning” for the Turing class, together with Prof. Deng Cai and Prof. Xiaofei He

System Developer

Optiver

Apr 2020 – Nov 2020 Shanghai

Responsibilities include:

Developing the rule-based autotraders.
Improving the machine learning pipeline.
Improving the testing environment for binaries.

ML SWE Intern

Google CloudAI

Jun 2019 – Sep 2019 Beijing

Exploring extended application for Tesseract with some development.

Fortunate to work with Jingtao Wang, Sijia Ma, Yong Cheng

ML Engineer Intern

Hangzhou FABU Co. Ltd

May 2018 – Nov 2018 Hangzhou

Fortunate to work with many colleagues.

Designing and developing human face module in ADAS
Designing and developing human face verification system

ML Research Intern

Hangzhou Netease Co. Ltd

Jan 2018 – Mar 2018 Hangzhou

Researching in single image super resolution.

Master Student

Zhejiang University

Sep 2017 – Mar 2020 Hangzhou

Responsibilities include:

Learning basic computer science knowledge
Researching in machine learning, data mining and computer vision
Teaching assistants for Machine Learning course

Featured Publications

Jinming Hu, Armaan Nanji, Zixiu Meng, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang

February 2026

FEDDE: Federated Data Deduplication

This paper introduces FEDDE, a general and efficient framework that addresses data redundancy across clients to facilitate effective federated learning (FL). At its core, FEDDE adopts a hierarchical deduplication architecture where clients first perform local, centralized deduplication and then send minimal records that are only meaningful for redundancy detection to the server for global deduplication. To enable flexible trade-offs between FL training efficiency and the accuracy of the training outcomes, FEDDE proposes two-round approximate deduplication protocols. A set of system optimizations is further applied to reduce deduplication overhead.

Jinming Hu, Jiahao Gu, Kenta Ploch, Hao Wang, Jingxian Wang, Wentao Wu, Qizhen Zhang

February 2026 To appear in Proceedings of the ACM on Management of Data (SIGMOD 2026)

Understanding the Impact of Data Noise in Federated Learning: Experiments and Analysis

Federated learning (FL) has emerged as a popular paradigm for distributed machine learning over decentralized data. Data generated by FL clients is prone to noises. While the impact of data noise on centralized learning (CL) is well understood, there is lack of a systematic study for FL. We fill this gap by presenting an empirical investigation to provide a deeper understanding regarding the impact of data noise on FL. Our study is enabled by NoiseMaker, an open-source and extensible toolkit for the injection of controlled data noises across five diverse data modalities. Our experimental evaluation results reveal that FL is significantly more vulnerable to data noise compared to CL.

February 2026

The Informational Foundation of Physical Reality: Proving the Necessity of Relativity and Quantics via Information Bandwidth

This paper establishes that General Relativity and Quantum Mechanics are necessary logical consequences of the Axiom of Finite Information. We introduce a new fundamental constant, i, representing the Information Maximum Transfer Speed, and posit that i > c, where c is the speed of light in a vacuum. By substituting i into the relativistic framework, we demonstrate that the finite nature of i is the primary mechanism preventing infinite information density and logical singularities. Furthermore, we prove that a ‘Theory of Everything’ is precluded by the computational cost of self-reference, and propose the observation of Computational Redshift as a definitive empirical test for the gap between c and i.

December 2025

The Computational Event Horizon: A Heuristic Taxonomy of Solvability for the Millennium Prize Problems

We propose a conceptual framework to resolve the dichotomy of the Millennium Prize Problems by categorizing mathematical systems based on their capacity for logical simulation. We distinguish between Class I (Structural) problems (e.g., Poincaré, Hodge, Yang-Mills), which rely on symmetries, conservation laws, and coercivity estimates that constrain degrees of freedom effectively, and Class II (Simulational) problems (e.g., P vs NP, Navier-Stokes), which theoretically possess the fidelity to simulate Universal Turing Machines. While not a formal proof of independence, we argue that Class II problems face obstructions isomorphic to the Halting Problem, inhibiting standard analytic techniques. We posit that the ‘intractability’ of these problems arises because they inhabit a complexity class where asymptotic behavior is determined by generalized computation rather than geometric structure.