edelweiss package
Submodules
edelweiss.classifier module
- class edelweiss.classifier.Classifier(scaler='standard', clf='XGB', calibrate=True, cv=0, cv_scoring='f1', params=None, **clf_kwargs)[source]
Bases:
objectThe detection classifer class that wraps a sklearn classifier.
- Parameters:
scaler – the scaler to use for the classifier, options: standard, minmax, maxabs, robust, quantile
clf – the classifier to use, options are: XGB, MLP, RandomForest, NeuralNetwork, LogisticRegression, LinearSVC, DecisionTree, AdaBoost, GaussianNB, QDA, KNN,
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
- fit(X, y, param_grid=None, **args)
Train the classifier.
- Parameters:
X – the features to train on (array or recarray)
y – the labels to train on
param_grid – the hyperparameter grid to search over
args – additional arguments for the classifier
- predict(X, prob_multiplier=1.0)[source]
Predict the labels for a given set of features.
- Parameters:
X – the features to predict on (array or recarry)
- Returns:
the predicted labels
- predict_non_proba(X)[source]
Predict the probabilities for a given set of features.
- Parameters:
X – the features to predict on (array or recarry)
- Returns:
the predicted probabilities
- predict_proba(X)[source]
Predict the probabilities for a given set of features.
- Parameters:
X – the features to predict on (array or recarry)
- Returns:
the predicted probabilities
- class edelweiss.classifier.MultiClassClassifier(scaler='standard', clf='XGB', calibrate=True, cv=0, cv_scoring='f1', params=None, **clf_kwargs)[source]
Bases:
ClassifierThe detection classifer class that wraps a sklearn classifier for multiple classes.
- Parameters:
scaler – the scaler to use for the classifier, options: standard, minmax, maxabs, robust, quantile
clf – the classifier to use, options are: XGB, MLP, RandomForest, NeuralNetwork, LogisticRegression, LinearSVC, DecisionTree, AdaBoost, GaussianNB, QDA, KNN,
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
- predict(X)[source]
Predict the labels for a given set of features.
- Parameters:
X – the features to predict on (array or recarry)
- Returns:
the predicted labels
- predict_non_proba(X)[source]
Predict the class non-probabilistically for a given set of features.
- Parameters:
X – the features to predict on (array or recarry)
- Returns:
the predicted probabilities
- class edelweiss.classifier.MultiClassifier(split_label='galaxy_type', labels=None, scaler='standard', clf='XGB', calibrate=True, cv=0, cv_scoring='f1', params=None, **clf_kwargs)[source]
Bases:
objectA classifier class that trains multiple classifiers for a specific label. This label could e.g. be the galaxy type (star, red galaxy, blue galaxy).
- Parameters:
split_label – the label to split the data in different classifers
labels – the different labels of the split label
scaler – the scaler to use for the classifier
clf – the classifier to use
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
- fit(X, y)
Train the classifier.
- save(path, subfolder=None)[source]
Save the classifier to a given path.
- Parameters:
path – path to the folder where the emulator is saved
subfolder – subfolder of the emulator folder where the classifier is stored
edelweiss.clf_diagnostics module
- edelweiss.clf_diagnostics.add_range_to_name(field_names, ranges)[source]
Add the range to the name of the variable such that the range is visible in the spider plot.
- Parameters:
field_names – list with the names of the variables
ranges – dictionary with the ranges for each variable
- edelweiss.clf_diagnostics.get_all_scores(test_arr, y_test, y_pred, y_prob)[source]
Calculates all the scores and append them to the test_arr dict
- Parameters:
test_arr – dict where the test scores will be saved
y_test – test labels
y_pred – predicted labels
y_prob – probability of being detected
- edelweiss.clf_diagnostics.get_all_scores_multiclass(test_arr, y_test, y_pred, y_prob)[source]
Calculates all the scores and append them to the test_arr dict for a multiclass classifier.
- Parameters:
test_arr – dict where the test scores will be saved
y_test – test labels
y_pred – predicted labels
y_prob – probability of being detected
- edelweiss.clf_diagnostics.get_confusion_matrix(y_true, y_pred)[source]
Get the confusion matrix for the classifier.
- Parameters:
y_true – true labels
y_pred – predicted labels
- Returns:
True Positives, True Negatives, False Positives, False Negatives
- edelweiss.clf_diagnostics.get_default_ranges_for_spider()[source]
Get the default ranges for the spider plot.
- Returns:
dictionary with the ranges for each variable
- edelweiss.clf_diagnostics.get_name(clf, final=False)[source]
Get the name to add to the classifier
- Parameters:
clf – classifier object (from sklearn) or name of the classifier
final – if True, the classifier was tested on the test data.
- Returns:
name
- edelweiss.clf_diagnostics.plot_all_scores(scores, path_labels=None)[source]
Plot all scores for the classifiers. Input can either be directly a recarray with the scores or the path to the scores or a list of paths to the scores. If a list is given, the scores of the different paths are combined and plotted with different colors.
- Parameters:
scores – recarray with the scores or path to the scores or list of paths
path_labels – list of labels for the different paths
- edelweiss.clf_diagnostics.plot_calibration_curve(y_true, y_prob, output_directory='.', clf='classifier', final=False, save_plot=False, fig=None)[source]
Plot the calibration curve for the classifier.
- Parameters:
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
- edelweiss.clf_diagnostics.plot_classifier_comparison(clfs, conf, path, spider_ranges=None, labels=None, print_scores=False, special_param='mag_i')[source]
Plot the diagnostics for chosen classifiers. If the classifiers are not all from same path, the conf and path parameters should be lists of the same length as clfs.
- Parameters:
clfs – list of classifier names
conf – configuration dictionary or list of dictionaries
path – path to the data or list of paths
spider_ranges – dictionary with the ranges for the spider plot
labels – list of labels for the different paths
print_scores – if True, print the scores for the different classifiers
special_param – param to plot the histogram for
- edelweiss.clf_diagnostics.plot_diagnostics(clf, X_test, y_test, output_directory='.', final=False, save_plot=False, special_param='mag_i')[source]
Plot the diagnostics for the classifier.
- Parameters:
clf – classifier object
X_test – test data
y_test – true labels
output_directory – directory to save the plots
final – if True, the classifier was tested on the test data.
save_plot – if True, save the plots
special_param – param to plot the histogram for
- edelweiss.clf_diagnostics.plot_feature_importances(clf, clf_name='classifier', summed=False)[source]
Plots the feature importances for the classifier.
- Parameters:
clf – classifier object
names – names of the features
clf_name – name of the classifier
summed – if True, the summed feature importances are plotted
- edelweiss.clf_diagnostics.plot_hist_fp_fn_tp_tn(param, y_true, y_pred, output_directory='.', clf='classifier', final=False, save_plot=False)[source]
Plot the stacked histogram of one parameter (e.g. i-band magnitude) for the different confusion matrix elements.
- Parameters:
param – parameter to plot
y_true – true labels
y_pred – predicted labels
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
- edelweiss.clf_diagnostics.plot_hist_n_gal(param, y_true, y_pred, output_directory='.', clf='classifier', final=False, save_plot=False, fig=None)[source]
Plot the histogram of detected galaxies for the classifer and the true detected galaxies for one parameter (e.g. i-band magnitude).
- Parameters:
param – parameter to plot
y_true – true labels
y_pred – predicted labels
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
- edelweiss.clf_diagnostics.plot_pr_curve(y_true, y_prob, output_directory='.', clf='classifier', final=False, save_plot=False, fig=None)[source]
Plot the precision-recall curve for the classifier.
- Parameters:
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
- Returns:
figure object
- edelweiss.clf_diagnostics.plot_roc_curve(y_true, y_prob, output_directory='.', clf='classifier', final=False, save_plot=False, fig=None)[source]
Plot the ROC curve for the classifier.
- Parameters:
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
- edelweiss.clf_diagnostics.plot_spider_scores(y_true, y_pred, y_prob, output_directory='.', clf='classifier', final=False, save_plot=False, fig=None, ranges=None, print_scores=False)[source]
Plot the spider scores for the classifier.
- Parameters:
y_true – true labels
y_pred – predicted labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
ranges – dictionary of ranges for each score
print_scores – if True, print the scores
- Returns:
figure object
- edelweiss.clf_diagnostics.scale_data_for_spider(data, ranges=None)[source]
Scale the data for the spider plot such that the chosen range corresponds to the 0-1 range of the spider plot.
If the lower value of the range is higher than the upper value, the data is inverted.
- Parameters:
data – data to scale
- Ranges:
dictionary with the ranges for each variable, if a parameter is not in the
dictionary, the default range is (0, 1) :return: scaled data
edelweiss.clf_utils module
- edelweiss.clf_utils.custom_roc_auc_score(y_true, y_prob)[source]
Scorer for the ROC AUC score using y_prob
- Parameters:
y_true – true labels (detected or not)
y_prob – predicted probabilities (2D array)
- Returns:
score
- edelweiss.clf_utils.get_classifier(classifier, scaler=None, **kwargs)[source]
Returns the classifier object
- Parameters:
classifier – name of the classifier
scaler – scaler object
kwargs – additional arguments for the classifier
- Returns:
classifier object (sklearn pipeline)
- Raises:
ValueError if classifier is not known
- edelweiss.clf_utils.get_classifier_args(clf, conf)[source]
Returns the arguments for the classifier defined in the config file
- Parameters:
clf – classifier name
conf – config file
- Returns:
arguments for the classifier
- edelweiss.clf_utils.get_clf_name(index=None)[source]
Returns the name of the classifier file.
- Parameters:
index – index of the classifier
- Returns:
name of the classifier file
- edelweiss.clf_utils.get_detection_label(clf, bands, n_detected_bands=None)[source]
Get the detection label for the classifier.
- Parameters:
clf – classification data (rec array)
bands – which bands the data has
n_detected_bands – how many bands have to be detected such that the event is
classified as detected, if None, the detection label is already given in clf :return: detection label (bool array) and the names of the detection labels
- edelweiss.clf_utils.get_scaler(scaler)[source]
Returns the scaler object
- Parameters:
scaler – name of the scaler
- Returns:
scaler object
- Raises:
ValueError if scaler is not known
- edelweiss.clf_utils.get_scorer(score, **kwargs)[source]
Returns the scorer object given input string. If not one of the known self defined scorers, returns the input string assuming it is a sklearn scorer.
- Parameters:
score – name of the scorer
- Kwargs:
additional arguments for the scorer
- Returns:
scorer object
- edelweiss.clf_utils.load_hyperparams(clf)[source]
Loads the hyperparameters for the classifier for the CV search.
- Parameters:
clf – classifier object
- Returns:
hyperparameter grid
- edelweiss.clf_utils.ngal_hist_scorer(y_true, y_pred, mag, bins=100, range=(15, 30))[source]
Scorer accounting for the number of galaxies in the sample on a histogram level. score = (N_pred - N_true)**2
- Parameters:
y_true – true labels (detected or not)
y_pred – predicted labels (detected or not)
mag – magnitude of the galaxies
- Returns:
score
edelweiss.custom_clfs module
- class edelweiss.custom_clfs.NeuralNetworkClassifier(hidden_units=(64, 32), learning_rate=0.001, epochs=10, batch_size=32, loss='auto', activation='relu', activation_output='auto')[source]
Bases:
BaseEstimator,ClassifierMixinNeural network classifier based on Keras Sequential model
- Parameters:
hidden_units – tuple/list, optional (default=(64, 32)) The number of units per hidden layer
learning_rate – float, optional (default=0.001) The learning rate for the Adam optimizer
epochs – int, optional (default=10) The number of epochs to train the model
batch_size – int, optional (default=32) The batch size for training the model
loss – str, optional (default=”auto”) The loss function to use, defaults to binary_crossentropy if binary and sparse_categorical_crossentropy if multiclass
activation – str, optional (default=”relu”) The activation function to use for the hidden layers
activation_output – str, optional (default=”auto”) The activation function to use for the output layer, defaults to sigmoid for single class and softmax for multiclass
sample_weight_col – int, optional (default=None)
- fit(X, y, sample_weight=None, early_stopping_patience=10)[source]
Fit the neural network model
- Parameters:
X – array-like, shape (n_samples, n_features) The training input samples
y – array-like, shape (n_samples,) The target values
sample_weight – array-like, shape (n_samples,), optional (default=None) Sample weights
early_stopping_patience – int, optional (default=10) The number of epochs with no improvement after which training will be stopped
- predict(X)[source]
Predict the class labels for the provided data
- Parameters:
X – array-like, shape (n_samples, n_features) The input samples
- Returns:
array-like, shape (n_samples,) The predicted class labels
- predict_proba(X)[source]
Predict the class probabilities for the provided data
- Parameters:
X – array-like, shape (n_samples, n_features) The input samples
- Returns:
array-like, shape (n_samples, n_classes) The predicted class probabilities
- set_fit_request(*, early_stopping_patience: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') NeuralNetworkClassifier
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- early_stopping_patiencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
early_stopping_patienceparameter infit.- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') NeuralNetworkClassifier
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- selfobject
The updated object.
edelweiss.custom_regs module
- class edelweiss.custom_regs.NeuralNetworkRegressor(hidden_units=(64, 64), learning_rate=0.001, epochs=10, batch_size=32, loss='mse', activation='relu', activation_output='linear', dropout_prob=0.0)[source]
Bases:
BaseEstimatorNeural network regressor based on Keras Sequential model
- Parameters:
hidden_units – tuple/list, optional (default=(64, 64)) The number of units per hidden layer
learning_rate – float, optional (default=0.001) The learning rate for the Adam optimizer
epochs – int, optional (default=10) The number of epochs to train the model
batch_size – int, optional (default=32) The batch size for training the model
loss – str, optional (default=”mse”) The loss function to use
activation – str, optional (default=”relu”) The activation function to use for the hidden layers
activation_output – str, optional (default=”linear”) The activation function to use for the output layer
- fit(X, y, sample_weight=None, early_stopping_patience=10)[source]
Fit the neural network model
- Parameters:
X – array-like, shape (n_samples, n_features) The training input samples
y – array-like, shape (n_samples, n_outputs) The target values
sample_weight – array-like, shape (n_samples,), optional (default=None)
early_stopping_patience – int, optional (default=10) The number of epochs with no improvement after which training will be stopped
- predict(X)[source]
Predict the output from the input.
- Parameters:
X – the input data
- Returns:
the predicted output
- set_fit_request(*, early_stopping_patience: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') NeuralNetworkRegressor
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- early_stopping_patiencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
early_stopping_patienceparameter infit.- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter infit.
- selfobject
The updated object.
edelweiss.emulator module
- edelweiss.emulator.load_emulator(path, bands=('g', 'r', 'i', 'z', 'y'), multiclassifier=False, subfolder_clf=None, subfolder_nflow=None)[source]
Load an emulator from a given path. If bands is None, returns the classifier and normalizing flow. If bands is not None, returns the classifier and a dictionary of normalizing flows for each band.
- Parameters:
path – path to the folder containing the emulator
bands – the bands to load (if None, assumes that there is only one nflow)
multiclassifier – whether to load a multiclassifier or not
subfolder_clf – subfolder of the emulator folder where the classifier is stored
subfolder_nflow – subfolder of the emulator folder where the normalizing flow
is stored :return: the loaded classifier and normalizing flow
edelweiss.nflow module
- class edelweiss.nflow.Nflow(output=None, input=None, scaler='standard')[source]
Bases:
objectThe normalizing flow class that wraps a pzflow normalizing flow.
- Parameters:
output – the names of the output parameters
input – the names of the input parameters (=conditional parameters)
scaler – the scaler to use for the normalizing flow
- fit(X, epochs=100, batch_size=1024, progress_bar=True, verbose=True, min_loss=5)
Train the normalizing flow.
- Parameters:
X – the features to train on (recarray)
epochs – number of epochs
batch_size – batch size
progress_bar – whether to show a progress bar
verbose – whether to print the losses
min_loss – minimum loss that is allowed for convergence
- sample(X=None, n_samples=1)[source]
Sample from the normalizing flow.
- Parameters:
X – the features to sample from (recarray or None for non-conditional sampling)
n_samples – number of samples to draw, number of total samples is n_samples * len(X)
- Returns:
the sampled features (including the conditional parameters)
- save(path, band=None, subfolder=None)[source]
Save the normalizing flow to a given path.
- Parameters:
path – path to the folder where the emulator is saved
subfolder – subfolder of the emulator folder where the normalizing flow is stored
- train(X, epochs=100, batch_size=1024, progress_bar=True, verbose=True, min_loss=5)[source]
Train the normalizing flow.
- Parameters:
X – the features to train on (recarray)
epochs – number of epochs
batch_size – batch size
progress_bar – whether to show a progress bar
verbose – whether to print the losses
min_loss – minimum loss that is allowed for convergence
- edelweiss.nflow.load_nflow(path, band=None, subfolder=None)[source]
Load a normalizing flow from a given path.
- Parameters:
path – path to the folder containing the emulator
band – the band to load (if None, assumes that there is only one nflow)
subfolder – subfolder of the emulator folder where the normalizing flow is stored
- Returns:
the loaded normalizing flow
edelweiss.nflow_utils module
- exception edelweiss.nflow_utils.ModelNotConvergedError(model_name, reason=None)[source]
Bases:
ExceptionCustom error class for when a has not converged.
- edelweiss.nflow_utils.check_convergence(losses, min_loss=5)[source]
Check if the model has converged.
- Parameters:
losses – list of losses
min_loss – minimum loss, if the loss is higher than this,
the model has not converged :raises ModelNotConvergedError: if the model has not converged
- edelweiss.nflow_utils.get_scalers(scaler)[source]
Get the scalers from the name.
- Parameters:
scaler – name of the scaler (str)
- Returns:
scaler
- Raises:
ValueError – if the scaler is not implemented
edelweiss.reg_utils module
- edelweiss.reg_utils.get_regressor(regressor, scaler, **kwargs)[source]
Returns the regressor object
- Parameters:
regressor – name of the regressor
scaler – scaler object
kwargs – additional arguments for the regressor
- Returns:
regressor object (sklearn pipeline)
- Raises:
ValueError if regressor is not known
edelweiss.regressor module
- class edelweiss.regressor.Regressor(scaler='standard', reg='linear', cv=0, cv_scoring='neg_mean_squared_error', input_params=None, output_params=None, **reg_kwargs)[source]
Bases:
objectWrapper class for a several regression models.
- Parameters:
scaler – the scaler to use for the regressor
reg – the regressor to use
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
input_params – the names of the input parameters
output_params – the names of the output parameters
reg_kwargs – additional keyword arguments for the regressor
- fit(X, y, flat_param=None, **args)
Train the regressor.
- Parameters:
X – the training data
y – the training labels
- predict(X)[source]
Predict the output from the input.
- Parameters:
X – the input data
- Returns:
the predicted output as a recarray
edelweiss.tf_utils module
- class edelweiss.tf_utils.EpochProgressCallback(total_epochs)[source]
Bases:
CallbackClass to implement a tqdm progress bar over epochs, written by ChatGPT, provided by Arne Thomsen
- on_epoch_end(epoch, logs=None)[source]
Called at the end of an epoch.
Subclasses should override for any actions to run. This function should only be called during TRAIN mode.
- Args:
epoch: Integer, index of epoch. logs: Dict, metric results for this training epoch, and for the
validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.