Bases: object
The detection classifer class that wraps a sklearn classifier.
scaler – the scaler to use for the classifier, options: standard, minmax, maxabs, robust, quantile
clf – the classifier to use, options are: XGB, MLP, RandomForest, NeuralNetwork, LogisticRegression, LinearSVC, DecisionTree, AdaBoost, GaussianNB, QDA, KNN,
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
Train the classifier.
X – the features to train on (array or recarray)
y – the labels to train on
args – additional arguments for the classifier
Predict the labels for a given set of features.
X – the features to predict on (array or recarry)
the predicted labels
Predict the probabilities for a given set of features.
X – the features to predict on (array or recarry)
the predicted probabilities
Predict the probabilities for a given set of features.
X – the features to predict on (array or recarry)
the predicted probabilities
Save the classifier to a given path.
path – path to the folder where the emulator is saved
subfolder – subfolder of the emulator folder where the classifier is stored
Bases: Classifier
The detection classifer class that wraps a sklearn classifier for multiple classes.
scaler – the scaler to use for the classifier, options: standard, minmax, maxabs, robust, quantile
clf – the classifier to use, options are: XGB, MLP, RandomForest, NeuralNetwork, LogisticRegression, LinearSVC, DecisionTree, AdaBoost, GaussianNB, QDA, KNN,
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
Predict the labels for a given set of features.
X – the features to predict on (array or recarry)
the predicted labels
Predict the class non-probabilistically for a given set of features.
X – the features to predict on (array or recarry)
the predicted probabilities
Bases: object
A classifier class that trains multiple classifiers for a specific label. This label could e.g. be the galaxy type (star, red galaxy, blue galaxy).
split_label – the label to split the data in different classifers
labels – the different labels of the split label
scaler – the scaler to use for the classifier
clf – the classifier to use
calibrate – whether to calibrate the probabilities
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
params – the names of the parameters
clf_kwargs – additional keyword arguments for the classifier
Train the classifier.
Save the classifier to a given path.
path – path to the folder where the emulator is saved
subfolder – subfolder of the emulator folder where the classifier is stored
Add the range to the name of the variable such that the range is visible in the spider plot.
field_names – list with the names of the variables
ranges – dictionary with the ranges for each variable
Calculates all the scores and append them to the test_arr dict
test_arr – dict where the test scores will be saved
y_test – test labels
y_pred – predicted labels
y_prob – probability of being detected
Calculates all the scores and append them to the test_arr dict for a multiclass classifier.
test_arr – dict where the test scores will be saved
y_test – test labels
y_pred – predicted labels
y_prob – probability of being detected
Get the confusion matrix for the classifier.
y_true – true labels
y_pred – predicted labels
True Positives, True Negatives, False Positives, False Negatives
Get the default ranges for the spider plot.
dictionary with the ranges for each variable
Get the name to add to the classifier
clf – classifier object (from sklearn) or name of the classifier
final – if True, the classifier was tested on the test data.
name
Plot all scores for the classifiers. Input can either be directly a recarray with the scores or the path to the scores or a list of paths to the scores. If a list is given, the scores of the different paths are combined and plotted with different colors.
scores – recarray with the scores or path to the scores or list of paths
path_labels – list of labels for the different paths
Plot the calibration curve for the classifier.
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
Plot the diagnostics for chosen classifiers. If the classifiers are not all from same path, the conf and path parameters should be lists of the same length as clfs.
clfs – list of classifier names
conf – configuration dictionary or list of dictionaries
path – path to the data or list of paths
spider_ranges – dictionary with the ranges for the spider plot
labels – list of labels for the different paths
print_scores – if True, print the scores for the different classifiers
special_param – param to plot the histogram for
Plot the diagnostics for the classifier.
clf – classifier object
X_test – test data
y_test – true labels
output_directory – directory to save the plots
final – if True, the classifier was tested on the test data.
save_plot – if True, save the plots
special_param – param to plot the histogram for
Plots the feature importances for the classifier.
clf – classifier object
names – names of the features
clf_name – name of the classifier
summed – if True, the summed feature importances are plotted
Plot the stacked histogram of one parameter (e.g. i-band magnitude) for the different confusion matrix elements.
param – parameter to plot
y_true – true labels
y_pred – predicted labels
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
Plot the histogram of detected galaxies for the classifer and the true detected galaxies for one parameter (e.g. i-band magnitude).
param – parameter to plot
y_true – true labels
y_pred – predicted labels
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
Plot the precision-recall curve for the classifier.
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
figure object
Plot the ROC curve for the classifier.
y_true – true labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
Plot the spider scores for the classifier.
y_true – true labels
y_pred – predicted labels
y_prob – predicted probabilities
output_directory – directory to save the plot
clf – classifier object or name of the classifier
final – if True, the plot is for the final classifier
save_plot – if True, save the plot
fig – figure object, if None, create a new figure
ranges – dictionary of ranges for each score
print_scores – if True, print the scores
figure object
Scale the data for the spider plot such that the chosen range corresponds to the 0-1 range of the spider plot.
If the lower value of the range is higher than the upper value, the data is inverted.
data – data to scale
dictionary with the ranges for each variable, if a parameter is not in the
dictionary, the default range is (0, 1) :return: scaled data
Scorer for the ROC AUC score using y_prob
y_true – true labels (detected or not)
y_prob – predicted probabilities (2D array)
score
Returns the classifier object
classifier – name of the classifier
scaler – scaler object
kwargs – additional arguments for the classifier
classifier object (sklearn pipeline)
ValueError if classifier is not known
Returns the arguments for the classifier defined in the config file
clf – classifier name
conf – config file
arguments for the classifier
Returns the name of the classifier file.
index – index of the classifier
name of the classifier file
Get the detection label for the classifier.
clf – classification data (rec array)
bands – which bands the data has
n_detected_bands – how many bands have to be detected such that the event is
classified as detected, if None, the detection label is already given in clf :return: detection label (bool array) and the names of the detection labels
Returns the scaler object
scaler – name of the scaler
scaler object
ValueError if scaler is not known
Returns the scorer object given input string. If not one of the known self defined scorers, returns the input string assuming it is a sklearn scorer.
score – name of the scorer
additional arguments for the scorer
scorer object
Loads the hyperparameters for the classifier for the CV search.
clf – classifier object
hyperparameter grid
Scorer accounting for the number of galaxies in the sample on a histogram level. score = (N_pred - N_true)**2
y_true – true labels (detected or not)
y_pred – predicted labels (detected or not)
mag – magnitude of the galaxies
score
Bases: BaseEstimator
, ClassifierMixin
Neural network classifier based on Keras Sequential model
hidden_units – tuple/list, optional (default=(64, 32)) The number of units per hidden layer
learning_rate – float, optional (default=0.001) The learning rate for the Adam optimizer
epochs – int, optional (default=10) The number of epochs to train the model
batch_size – int, optional (default=32) The batch size for training the model
loss – str, optional (default=”auto”) The loss function to use, defaults to binary_crossentropy if binary and sparse_categorical_crossentropy if multiclass
activation – str, optional (default=”relu”) The activation function to use for the hidden layers
activation_output – str, optional (default=”auto”) The activation function to use for the output layer, defaults to sigmoid for single class and softmax for multiclass
sample_weight_col – int, optional (default=None)
Fit the neural network model
X – array-like, shape (n_samples, n_features) The training input samples
y – array-like, shape (n_samples,) The target values
sample_weight – array-like, shape (n_samples,), optional (default=None) Sample weights
early_stopping_patience – int, optional (default=10) The number of epochs with no improvement after which training will be stopped
Predict the class labels for the provided data
X – array-like, shape (n_samples, n_features) The input samples
array-like, shape (n_samples,) The predicted class labels
Predict the class probabilities for the provided data
X – array-like, shape (n_samples, n_features) The input samples
array-like, shape (n_samples, n_classes) The predicted class probabilities
Request metadata passed to the fit
method.
Note that this method is only relevant if
enable_metadata_routing=True
(see sklearn.set_config()
).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True
: metadata is requested, and passed to fit
if provided. The request is ignored if metadata is not provided.
False
: metadata is not requested and the meta-estimator will not pass it to fit
.
None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED
) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
Metadata routing for early_stopping_patience
parameter in fit
.
Metadata routing for sample_weight
parameter in fit
.
The updated object.
Request metadata passed to the score
method.
Note that this method is only relevant if
enable_metadata_routing=True
(see sklearn.set_config()
).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True
: metadata is requested, and passed to score
if provided. The request is ignored if metadata is not provided.
False
: metadata is not requested and the meta-estimator will not pass it to score
.
None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED
) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
Metadata routing for sample_weight
parameter in score
.
The updated object.
Bases: BaseEstimator
Neural network regressor based on Keras Sequential model
hidden_units – tuple/list, optional (default=(64, 64)) The number of units per hidden layer
learning_rate – float, optional (default=0.001) The learning rate for the Adam optimizer
epochs – int, optional (default=10) The number of epochs to train the model
batch_size – int, optional (default=32) The batch size for training the model
loss – str, optional (default=”mse”) The loss function to use
activation – str, optional (default=”relu”) The activation function to use for the hidden layers
activation_output – str, optional (default=”linear”) The activation function to use for the output layer
Fit the neural network model
X – array-like, shape (n_samples, n_features) The training input samples
y – array-like, shape (n_samples, n_outputs) The target values
sample_weight – array-like, shape (n_samples,), optional (default=None)
early_stopping_patience – int, optional (default=10) The number of epochs with no improvement after which training will be stopped
Predict the output from the input.
X – the input data
the predicted output
Request metadata passed to the fit
method.
Note that this method is only relevant if
enable_metadata_routing=True
(see sklearn.set_config()
).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True
: metadata is requested, and passed to fit
if provided. The request is ignored if metadata is not provided.
False
: metadata is not requested and the meta-estimator will not pass it to fit
.
None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED
) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
Metadata routing for early_stopping_patience
parameter in fit
.
Metadata routing for sample_weight
parameter in fit
.
The updated object.
Load an emulator from a given path. If bands is None, returns the classifier and normalizing flow. If bands is not None, returns the classifier and a dictionary of normalizing flows for each band.
path – path to the folder containing the emulator
bands – the bands to load (if None, assumes that there is only one nflow)
multiclassifier – whether to load a multiclassifier or not
subfolder_clf – subfolder of the emulator folder where the classifier is stored
subfolder_nflow – subfolder of the emulator folder where the normalizing flow
is stored :return: the loaded classifier and normalizing flow
Bases: object
The normalizing flow class that wraps a pzflow normalizing flow.
output – the names of the output parameters
input – the names of the input parameters (=conditional parameters)
scaler – the scaler to use for the normalizing flow
Train the normalizing flow.
X – the features to train on (recarray)
epochs – number of epochs
batch_size – batch size
progress_bar – whether to show a progress bar
verbose – whether to print the losses
min_loss – minimum loss that is allowed for convergence
Sample from the normalizing flow.
X – the features to sample from (recarray or None for non-conditional sampling)
n_samples – number of samples to draw, number of total samples is n_samples * len(X)
the sampled features (including the conditional parameters)
Save the normalizing flow to a given path.
path – path to the folder where the emulator is saved
subfolder – subfolder of the emulator folder where the normalizing flow is stored
Train the normalizing flow.
X – the features to train on (recarray)
epochs – number of epochs
batch_size – batch size
progress_bar – whether to show a progress bar
verbose – whether to print the losses
min_loss – minimum loss that is allowed for convergence
Load a normalizing flow from a given path.
path – path to the folder containing the emulator
band – the band to load (if None, assumes that there is only one nflow)
subfolder – subfolder of the emulator folder where the normalizing flow is stored
the loaded normalizing flow
Bases: Exception
Custom error class for when a has not converged.
Check if the model has converged.
losses – list of losses
min_loss – minimum loss, if the loss is higher than this,
the model has not converged :raises ModelNotConvergedError: if the model has not converged
Get the scalers from the name.
scaler – name of the scaler (str)
scaler
ValueError – if the scaler is not implemented
Returns the regressor object
regressor – name of the regressor
scaler – scaler object
kwargs – additional arguments for the regressor
regressor object (sklearn pipeline)
ValueError if regressor is not known
Bases: object
Wrapper class for a several regression models.
scaler – the scaler to use for the regressor
reg – the regressor to use
cv – number of cross validation folds, if 0 no cross validation is performed
cv_scoring – the scoring method to use for cross validation
input_params – the names of the input parameters
output_params – the names of the output parameters
reg_kwargs – additional keyword arguments for the regressor
Train the regressor.
X – the training data
y – the training labels
Predict the output from the input.
X – the input data
the predicted output as a recarray
Bases: Callback
Class to implement a tqdm progress bar over epochs, written by ChatGPT, provided by Arne Thomsen
Called at the end of an epoch.
Subclasses should override for any actions to run. This function should only be called during TRAIN mode.
epoch: Integer, index of epoch. logs: Dict, metric results for this training epoch, and for the
validation epoch if validation is performed. Validation result keys are prefixed with val_. For training epoch, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.