Scalable Collaborative Learning via Representation Sharing
Decentralization and Trustworthy ML Workshop @ NeurIPS 2022
EPFL, Switzerland; MIT, USA
Abstract
Overview

Privacy-preserving machine learning has become a key challenge for multi-party AI systems. Federated learning (FL) and split learning (SL) enable collaborative training while keeping data on device, but they face limitations in communication efficiency and scalability. This paper presents a novel approach where clients collaborate via online knowledge distillation using a contrastive loss on feature representations.
How it works
The framework has three key components:
- Prototype sharing: Each client computes per-class averaged representations from its model’s last hidden layer and shares them with a relay server. The server aggregates prototypes across clients and broadcasts the global averages back.
- Feature-based knowledge distillation: A KD loss minimizes the L2 distance between each client’s local representations and the global prototypes for the same class, encouraging clients to learn similar features.
- Contrastive discrimination: A discriminator loss distinguishes whether two feature vectors come from the same or different classes, maximizing mutual information between client representations.
A theoretical analysis shows that this objective maximizes a lower bound on the mutual information between student and teacher representations.
Communication efficiency
Communication scales as O(M * C * d’) per round, where M is the number of clients, C is the number of classes, and d’ is the feature dimension. Critically, this does not depend on the model size D, making the approach orders of magnitude more efficient than federated learning for large models.
Results

Key numbers
- MNIST with 10 clients (LeNet-5): 82.07% accuracy vs. FL 70.06%, independent learning 72.86%, and federated distillation 77.90%.
- Fashion-MNIST (ResNet-9): Outperforms independent learning and federated distillation, with competitive performance against FL.
- Communication: Orders of magnitude less than FL, since only feature prototypes (not model parameters) are exchanged.
- Regularization effect: With 2 clients on MNIST, the method even outperforms centralized training, suggesting the prototype exchange acts as a form of regularization.
Citation
@misc{berdoz2022scalable,
author = {Berdoz, F. and Singh, A. and Jaggi, M. and Raskar, R.},
title = {{Scalable Collaborative Learning via Representation Sharing}},
note = {Best Paper Runner-up at NeurIPS Workshop on Decentralization and Trustworthy ML in Web3. arXiv:2211.10943},
year = {2022}
}