This document describes how to train the emulator on clusters. The pipelines always use the esub-epipe package to submit jobs to the cluster.
To test different classifiers using cross validation, run the following pipeline
esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --input_directory=data/training_data --output_directory=data/test_emulator_01 --train_val_test_split=80,10,10 --config_path=config/test_clf_config.yaml --n_samples=100000 --cv=2 --cv_scoring=f1 --save_test_data --calibrate_probabilities --function=all --system=slurm --source_fil=src/activate.sh --test_mode --mode=jobarray
The config file of the classifier should like this:
classifiers:
- RandomForest
- XGB
- MLP
scalers:
- standard
- minmax
- quantile
The number of jobs is automatically decided by the number of classifiers and scalers. To speed up the cross-validation, one can the cross-validation on many cores (e.g. 64), this can be assigned with –main_n_cores=64. Check the CPU usage of your job to see if you are using all the cores.
If you do not want cross-validation, but still want to specify the classifiers, scalers and other parameters, you can run the following pipeline
esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --cv=0 --classifier=NeuralNetwork --scaler=robust --input_directory=data/training_data --output_directory=data/test_emulator_01 --train_val_test_split=80,10,10 --config_path=config/test_clf_config.yaml --n_samples=100000 --cv=0 --calibrate_probabilities --function=all --system=slurm --source_fil=src/activate.sh --test_mode --mode=jobarray
and add the arguments to the config file
classifier_args:
NeuralNetwork:
hidden_units: [32, 64, 32]
learning_rate: 0.0001
batch_size: 10000
epochs: 1000
For an overview of all the arguments, run the following command
esub repos/edelweiss/src/edelweiss/apps/run_classifier.py --help
To train the normalizing flow, the parameters have to be specified in the config file. There are three types of parameters:
input_band_dep: input parameters that are different for each band
input_band_indep: input parameters that are the same for each band
output: output parameters
input_band_dep:
- mag
- psf_fwhm
input_band_indep:
- r50
- e_abs
output:
- MAG_APER
- FLUX_RADIUS
- FLUX_APER
- absolute_ellipticity
If there should be one normalizing flow for each band, the pipeline should be run with –function=main such and tasks corresponding to the number of bands. This submits a job array with the number of tasks equal to the number of bands and each task trains a normalizing flow for one band. An example pipeline would look like this:
esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --bands=["g", "r", "i", "z", "y"] --tasks="0>5" --function=main --input_directory=data/training_data/ --output_directory=data/test_emulator_01/ --config_path=config/config_emu_params.yaml --train_split=0.8 --epochs=100 --n_samples=10000 --scaler=quantile --batch_size=1000 --source_file=src/activate_gpu.sh --system=slurm --mode=jobarray
If there should be one normalizing flow for all bands, the pipeline should be run with –function=merge such that only one task is submitted. An example pipeline would look like this:
esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --bands=["g", "r", "i", "z", "y"] --tasks="0>5" --function=merge --input_directory=data/training_data/ --output_directory=data/test_emulator_01/ --config_path=config/config_emu_params.yaml --train_split=0.8 --epochs=100 --n_samples=10000 --scaler=quantile --batch_size=1000 --source_file=src/activate_gpu.sh --system=slurm --mode=jobarray
For an overview of all the arguments, run the following command
esub repos/edelweiss/src/edelweiss/apps/run_nflow.py --help