Containerized execution
A good step towards float-perfect reproducibility in your future
benchmarks is to use docker. We give a base docker image and examples
containing dataset download and benchmarking. For
Fed-Heart Disease,
cd
to the flamby dockers folder, replace myusername
and
mypassword
with your git credentials (OAuth token) in the command
below and run:
docker build -t flamby-heart -f Dockerfile.base --build-arg DATASET_PREFIX="heart" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
docker build -t flamby-heart-benchmark -f Dockerfile.heart .
docker run -it flamby-heart-benchmark
If you are convinced you will use many datasets with docker, build the
base image using all_extra
option for flamby’s install, you will be
able to reuse it for all datasets with multi-stage build:
docker build -t flamby-all -f Dockerfile.base --build-arg DATASET_PREFIX="all_extra" --build-arg GIT_USER="myusername" --build-arg GIT_PWD="mypassword" .
# modify Dockerfile.* line 1 to FROM flamby-all by replacing * with the dataset name of the dataset you are interested in
# Then run the following command replacing * similarly
#docker build -t flamby-* -f Dockerfile.* .
#docker run -it flamby-*-benchmark
Checkout Dockerfile.tcga
. Similar dockerfiles can be theoretically
easily built for the other datasets as well by replicating instructions
found in each dataset folder following the model of
Dockerfile.heart
. Note that for bigger datasets execution can be
prohibitively slow and docker can run out of time/memory.