FLHetBench is the first FL benchmark targeted toward understanding:What happens to different FL algorithms when they are employed in real-world FL environments with varying degrees of device and state heterogeneity?
Federated learning (FL) is a powerful technology that enables collaborative training of machine learning models without sharing private data among clients. The fundamental challenge in FL lies in learning over extremely heterogeneous data distributions, device capacities, and device state availabilities, all of which adversely impact performance and communication efficiency. While data heterogeneity has been well-studied in the literature, this paper introduces FLHetBench, the first FL benchmark targeted toward understanding device and state heterogeneity. FLHetBench comprises two new sampling methods to generate real-world device and state databases with varying heterogeneity and new metrics for quantifying the success of FL methods under these real-world constraints. Using FLHetBench, we conduct a comprehensive evaluation of existing methods and find that they struggle under these settings, which inspires us to propose BiasPrompt+, a new method employing staleness-aware aggregation and fast weights to tackle these new heterogeneity challenges. Experiments on various FL tasks and datasets validate the effectiveness of our BiasPrompt+ method and highlight the value of FLHetBench in fostering the development of more efficient and robust FL solutions under real-world device and state constraints.
FLHetBench consists of 1) two sampling methods, DPGMM for continuous device database and DPCSM for discrete state database, to sample real-world device and state datasets with varying heterogeneity; and 2) various metrics (DevMC-R/T for device heterogeneity, StatMC-R/T for state heterogeneity, and InterMC-R/T for both device and state heterogeneity) to assess the device/state heterogeneity in FL.
DPGMM and DPCSM allows us to sample varying degrees of device and state heterogeneity. We show the device distributions(device heterogeneity) and client states(state heterogeneity) of the sampled datasets with different heterogeneity levels (mild, middle and severe).
FLHetBench provides assessment to device and state heterogeneity by mimicking the real FL training process and accounting for various confounding factors.
We evaluate the performance of existing FL methods under varying degrees of device and state heterogeneity using FLHetBench.
Observation: Most methods perform well under mild device and state heterogeneity, but struggle with increased heterogeneity.
Two factors: The increased wall-clock time of clients and the low resource utilization of participating clients
BiasPrompt+ is a heterogeneity-aware algorithm which comprises two modules: a gradient surgery-based staleness-aware aggregation strategy and a communication-efficient module BiasPrompt based on fast weights.
@InProceedings{Zhang_2024_CVPR,
author = {Zhang, Junyuan and Zeng, Shuang and Zhang, Miao and Wang, Runxi and Wang, Feifei and Zhou, Yuyin and Liang, Paul Pu and Qu, Liangqiong},
title = {FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {12098-12108}
}