Usage: Training on clusters

This document describes how to train the emulator on clusters. The pipelines always use the esub-epipe package to submit jobs to the cluster.

Detection classifier

To test different classifiers using cross validation, run the following pipeline

esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --input_directory=data/training_data --output_directory=data/test_emulator_01 --train_val_test_split=80,10,10 --config_path=config/test_clf_config.yaml --n_samples=100000 --cv=2 --cv_scoring=f1 --save_test_data --calibrate_probabilities --function=all --system=slurm --source_fil=src/activate.sh --test_mode --mode=jobarray

The config file of the classifier should like this:

classifiers:
  - RandomForest
  - XGB
  - MLP

scalers:
  - standard
  - minmax
  - quantile

The number of jobs is automatically decided by the number of classifiers and scalers. To speed up the cross-validation, one can the cross-validation on many cores (e.g. 64), this can be assigned with –main_n_cores=64. Check the CPU usage of your job to see if you are using all the cores.

If you do not want cross-validation, but still want to specify the classifiers, scalers and other parameters, you can run the following pipeline

esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --cv=0 --classifier=NeuralNetwork --scaler=robust --input_directory=data/training_data --output_directory=data/test_emulator_01 --train_val_test_split=80,10,10 --config_path=config/test_clf_config.yaml --n_samples=100000 --cv=0 --calibrate_probabilities --function=all --system=slurm --source_fil=src/activate.sh --test_mode --mode=jobarray

and add the arguments to the config file

classifier_args:
  NeuralNetwork:
    hidden_units: [32, 64, 32]
    learning_rate: 0.0001
    batch_size: 10000
    epochs: 1000

For an overview of all the arguments, run the following command

esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --help

Normalizing flow

To train the normalizing flow, the parameters have to be specified in the config file. There are three types of parameters:

  • input_band_dep: input parameters that are different for each band

  • input_band_indep: input parameters that are the same for each band

  • output: output parameters

input_band_dep:
  - mag
  - psf_fwhm

input_band_indep:
  - r50
  - e_abs

output:
  - MAG_APER
  - FLUX_RADIUS
  - FLUX_APER
  - absolute_ellipticity

If there should be one normalizing flow for each band, the pipeline should be run with –function=main such and tasks corresponding to the number of bands. This submits a job array with the number of tasks equal to the number of bands and each task trains a normalizing flow for one band. An example pipeline would look like this:

esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --bands=["g", "r", "i", "z", "y"] --tasks="0>5" --function=main --input_directory=data/training_data/ --output_directory=data/test_emulator_01/ --config_path=config/config_emu_params.yaml --train_split=0.8 --epochs=100 --n_samples=10000 --scaler=quantile --batch_size=1000 --source_file=src/activate_gpu.sh --system=slurm --mode=jobarray

If there should be one normalizing flow for all bands, the pipeline should be run with –function=merge such that only one task is submitted. An example pipeline would look like this:

esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --bands=["g", "r", "i", "z", "y"] --tasks="0>5" --function=merge --input_directory=data/training_data/ --output_directory=data/test_emulator_01/ --config_path=config/config_emu_params.yaml --train_split=0.8 --epochs=100 --n_samples=10000 --scaler=quantile --batch_size=1000 --source_file=src/activate_gpu.sh --system=slurm --mode=jobarray

For an overview of all the arguments, run the following command

esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --help