Skip to content

Snellius Setup

  • Now for Snellius
  • As always, first ssh into Snellius ssh user@snellius.surf.nl
  • This one is pretty involved, so make sure you have the time for it.

Steps

  • We need to first log into Github of course, if you haven't done so already run git config --global credential.helper cache and then enter your details. Note that the password is your access token. You can always use another authentication if you like.
    • I prefer to use the github command tool gh but we cannot install it here.
  • Once that is done, go to your home directory and clone these repositories
    • git clone https://github.com/openml-labs/OpenML-auto-automl
    • git clone https://github.com/SubhadityaMukherjee/automlbenchmark
  • In your home directory, do a mkdir automl_data . This should run automatically when
  • Now just load Python and it's required modules so we can do the rest of the setup. You can do this using cd /home/user/OpenML-auto-automl && ./snellius_env_loader.sh . If you already built a conda environment, feel free to skip recreating it. you execute it but just in case :P
  • Now we need to build the Singularity (It is a docker alternative. Snellius only supports open source software so we use it) images for all the frameworks we want to run.
    • Since all of this is on the server, I prefer using nvim as my editor of choice. But if you are using VSCode or some such, feel free to just open it however you want.
    • cd to automlbenchmark/scripts/
    • nohup python build_images.py -m singularity -f autosklearn,flaml,gama > build_log.txt 2>&1 &
    • Of course, add whatever other frameworks you want to add.
    • If this does not work by any chance, you can also use cd /home/user/OpenML-auto-automl && nohup ./build_images.sh user singularity > build_log.txt 2>&1 &
    • This will take A LONG TIME to run and so the logs are stored in build_log.txt
    • Note that NOTHING else will work until these are built.
  • Alright! Now that all of this is done (hopefully) you should be good to go.
  • Do you want to set up the cron job? Move to test server docs
  • Do you want to manually check if it is working? cd /home/user/OpenML-auto-automl/src && python snellius_generate_sbatch.py -c -a apikey
  • Results are stored in /home/automl_data

FAQs

  • How do I check if new datasets are being downloaded?
    • Go to /home/user/automl_data and check the file dataset_list_for_cronjob.csv. If the first few rows do not correspond to new data, something is wrong
  • How do I know if the pipeline is working?
    • New dataset on OpenML -> New task + run on OpenML
    • There should be new slurm files in /home/user/automl_data