Using FLamby with Fed-BioMed
This document highlights how to interface FLamby with Fed-BioMed, the FL framework by Inria Sophia.
Fed-BioMed installation
Before running the examples below, it is necessary to have Fed-BioMed
installed on your machine. An easy-to-follow guide is available
here
to help you with the process. Please use the branch poc/flamby
:
git checkout poc/flamby
In addition, it is also necessary to have a local version of the FLamby package on your computer already installed in editable mode and to have downloaded the required dataset.s (see Getting started section).
Once Fed-BioMed is installed alongside FLamby, we need to first update all fedbiomed’s conda environments’ yaml config files. Go to the fedbiomed directory and cd into the conda environments folder:
cd /path/where/is/installed/fedbiomed/envs/development/conda/
Open both fedbiomed-node(-macosx).yaml & fedbiomed-researcher(-macosx).yaml into a text editor or IDE and add the following line under pip:
- -e /path/towards/your/editable/installation/of/FLamby
Then run:
${FEDBIOMED_DIR}/scripts/fedbiomed_environment clean
${FEDBIOMED_DIR}/scripts/configure_conda
```
At the end you should have access to 3 new conda environments.
You can test your installation by doing:
```
conda activate fedbiomed-researcher
ipython
import flamby
exit()
If the import goes through the installation is working. ## Launching Fed-BioMed components (FLamby datasets configuration)
An important thing to know is that the procedure to configure all FLamby datasets in Fed-BioMed is similar. Regardless of the dataset, the same components will be set up each time (the network, a certain number of nodes, and the researcher).
Firstable, make sure you have downloaded and installed the FLamby
dataset you want to use: there should be a dataset_location.yaml
file inside the FLamby repository in the dataset_creation_scripts
folder of the corresponding dataset.
cat /path/towards/flamby/as/given/to/conda/FLamby/flamby/datasets/fed_ixi/dataset_creation_scripts
Should display:
dataset_path: /path/towards/downloaded/ixi
download_complete: true
preprocessing_complete: true
The first step is to launch the network, by entering the following command on your terminal:
${FEDBIOMED_DIR}/scripts/fedbiomed_run network
Once the network is deployed, you will have to create as many nodes as there are centers in the FLamby dataset you want to use. In this example we’ll use IXI that has 3 centers, meaning that 3 nodes have to be created.
First node
Enter this command to create the 1st node :
${FEDBIOMED_DIR}/scripts/fedbiomed_run node add
Select FLamby option (6) and the IXI dataset (5) through the CLI menu.
Then enter a name and a description. You are free to write anything that
goes through your mind except for the tag. All nodes need to share
the same tag and this tag needs to be passed to the experiment. In
this example we will choose ixi
for the tag, so write ixi in the CLI
and enter. The last step is to input the center id of the center
corresponding to the node, as this is the first node we’ll write 0 as
FLamby uses python indexing. So write 0 and enter. Once everything is
validated you can now launch this node with the following command:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node start
We will now do the same for the second and 3rd node, however it is of the utmost importance to use different shells for each node. ### Second node
Open a new terminal and run:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config2.ini add
As with the first node select FLamby and IXI dataset, enter a name and a
description. Enter ixi
for the tag and select 1 for the center id to
deploy the second node. Then similarly run:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config2.ini start
3rd node
Open a new terminal and run:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config3.ini add
As with the first node select FLamby and IXI dataset, enter a name and a
description. Enter ixi
for the tag and select 2 for the center id to
deploy the third and final node. Then similarly run:
${FEDBIOMED_DIR}/scripts/fedbiomed_run node config config3.ini start
Notebook (researcher side)
We need to send commands to the nodes that are already up we will use the researcher environment. You can now activate the conda environment of the researcher by running:
conda activate fedbiomed-researcher
Then the simplest way is to use ipython or jupyter to copy-paste the commands below:
The first step is to create a class inheriting from Fed-BioMed’s
TorchTrainingPlan
. This class should have different attributes: -
model - loss - optimizer
This is very similar in spirit to FLamby’s strategies. We, then, should
implement the forward of the model in the method training_step
. For
this, we just have to write the class using one of FLamby’s dataset’s
set of arguments. The only difficulty is to also take care of the
serialization process (what happens when you pickle a class, send it to
a distant computer and reload it into a different runtime). In order to
do that we just have to copy the FLamby’s import and to give it to the
add_dependency
method from the parent class.
# We import the TorchTrainingPlan class from fedbiomed
from fedbiomed.common.training_plans import TorchTrainingPlan
# We import all flamby utilities, the model, loss, batch_size and optimizer details
from flamby.datasets.fed_ixi import (Baseline, BaselineLoss, Optimizer, BATCH_SIZE, LR, get_nb_max_rounds)
# We create a class inheriting from TorchTrainingPlan reimplementing
# the training step method and filling the necessary class attributes from FLamby
# information
class FLambyTrainingPlan(TorchTrainingPlan):
# Init of UNetTrainingPlan
def __init__(self, model_args: dict = {}):
super().__init__(model_args)
self.model = Baseline() # UNet model
self.loss = BaselineLoss() # Dice loss
# As we will be serializing the instance, we need the distant machine
# to have made the same imports as we did in order to deserialize the
# object without error.
# This will be the case for TorchTrainingPlan but not for FLamby
# so we add it below
deps = ['from flamby.datasets.fed_ixi import (Baseline,\
BaselineLoss,\
Optimizer)',]
self.add_dependency(deps)
self.optimizer = Optimizer(self.parameters())
def forward(self, x):
return self.model(x)
def training_step(self, img, target):
# We implement the forward and return the loss
output = self.forward(img)
loss = self.loss(output, target)
return loss
You can go and copy-paste the code above in ipython.
Fed-BioMed requires to pass the batch_size
, learning rate
and
number of local epochs
directly into the experiment abstraction. In
FLamby the strategies abstraction fuse the TrainingPlan and the
experiment. Copy-paste the code below in your notebook or interactive
shell:
# We create a dictionary of FedBioMed kwargs with the: batch_size, learning-rate and epochs.
# Fed-BioMed also counts local training steps in batch-updates (change currently made in the following branch of Fed-BioMed):
# https://gitlab.inria.fr/fedbiomed/fedbiomed/-/tree/poc/flamby (will be integrated to master very soon)
# In this example we set this parameter to 100, as in the FLamby benchmarks
training_args = {
'batch_size': BATCH_SIZE,
'lr': LR,
'num_updates': 100,
}
We are now ready to launch an experiment doing 5 rounds of
FederatedAveraging. We will instantiate the experiment abstraction and
give it: the class from above, the training_args
and the
FedAverage
aggregator. Copy-paste and execute the code below:
from fedbiomed.researcher.experiment import Experiment
from fedbiomed.researcher.aggregators.fedavg import FedAverage
# This is just a handle to access the available nodes that we tagged with ixi
tags = ['ixi']
num_rounds = 5
exp = Experiment(tags=tags,
model_class=FLambyTrainingPlan,
training_args=training_args,
round_limit=num_rounds,
aggregator=FedAverage(),
)
Now the last step to launch the simulated Federated Learning between the different nodes is to run the experiment:
exp.run()
Once the nodes have finished training the model, we can load it in the researcher runtime and evaluate its performances:
# We retrieve the model architecture from the experiment:
fed_model_ixi = exp.model_instance()
# We load the weights corresponding to the last round
fed_model_ixi.load_state_dict(exp.aggregated_params()[num_rounds - 1]['params'])
We can then evaluate it using traditional FLamby’s evaluation scripts:
from torch.utils.data import DataLoader
from flamby.utils import evaluate_model_on_tests
from flamby.datasets.fed_ixi import metric, BATCH_SIZE, FedIXITiny
# We load the test datasets
test_dataloader_ixi_client0 = DataLoader(dataset=FedIXITiny(center=0, train=False),batch_size=BATCH_SIZE)
test_dataloader_ixi_client1 = DataLoader(dataset=FedIXITiny(center=1, train=False),batch_size=BATCH_SIZE)
test_dataloader_ixi_client2 = DataLoader(dataset=FedIXITiny(center=2, train=False),batch_size=BATCH_SIZE)
# We evaluate the model on them
evaluate_model_on_tests(fed_model_ixi,
[test_dataloader_ixi_client0,
test_dataloader_ixi_client1,
test_dataloader_ixi_client2],
metric)
One can change the number of round or the number of updates and continue launching jobs. One can also use clean the environment and get ready to use other datasets.