Containerized execution

A good step towards float-perfect reproducibility in your future benchmarks is to use docker. We give a base docker image and examples containing dataset download and benchmarking. For Fed-Heart Disease, cd to the flamby dockers folder, replace myusername and mypassword with your git credentials (OAuth token) in the command below and run:

docker build -t flamby-heart -f Dockerfile.base --build-arg DATASET_PREFIX="heart" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
docker build -t flamby-heart-benchmark -f Dockerfile.heart .
docker run -it flamby-heart-benchmark

If you are convinced you will use many datasets with docker, build the base image using all_extra option for flamby’s install, you will be able to reuse it for all datasets with multi-stage build:

docker build -t flamby-all -f Dockerfile.base --build-arg DATASET_PREFIX="all_extra" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
# modify Dockerfile.* line 1 to FROM flamby-all by replacing * with the dataset name of the dataset you are interested in
# Then run the following command replacing * similarly
#docker build -t flamby-* -f Dockerfile.* .
#docker run -it flamby-*-benchmark

Checkout Dockerfile.tcga. Similar dockerfiles can be theoretically easily built for the other datasets as well by replicating instructions found in each dataset folder following the model of Dockerfile.heart. Note that for bigger datasets execution can be prohibitively slow and docker can run out of time/memory.