cosmic_toolbox package
Subpackages
Submodules
cosmic_toolbox.MultiInterp module
Multi-method interpolation framework.
Provides flexible N-dimensional interpolation with multiple backend methods including nearest neighbors, radial basis functions, linear interpolation, and machine learning regressors.
author: Tomasz Kacprzak
- class cosmic_toolbox.MultiInterp.MultiInterp(X, y, method='nn', **kw)[source]
Bases:
objectMulti-method N-dimensional interpolator.
Provides a unified interface to multiple interpolation methods with automatic coordinate scaling and bounds checking.
- Parameters:
X (numpy.ndarray) – Training points, shape (n_samples, n_dims).
y (numpy.ndarray) – Training values, shape (n_samples,).
method (str) – Interpolation method. Options: - ‘nn’: Nearest neighbor (scipy NearestNDInterpolator) - ‘linear’: Linear interpolation (scipy LinearNDInterpolator) - ‘rbf’: Radial basis function (scipy Rbf) - ‘rbft’: Radial basis function with bounds (Rbft class) - ‘knn_regression’: K-neighbors regression (sklearn) - ‘rnn_regression’: Radius neighbors regression (sklearn) - ‘knn_linear’: Local linear interpolation - ‘knn_balltree’: K-neighbors with BallTree - ‘random_forest’: Random forest regressor (sklearn)
kw – Additional keyword arguments passed to the underlying interpolator.
Example
>>> import numpy as np >>> from cosmic_toolbox.MultiInterp import MultiInterp >>> X = np.random.rand(100, 2) >>> y = np.sin(X[:, 0]) + np.cos(X[:, 1]) >>> interp = MultiInterp(X, y, method='nn') >>> Xi = np.random.rand(10, 2) >>> yi = interp(Xi)
- init_interp(**kw)[source]
Initialize the underlying interpolator based on method.
- Parameters:
kw – Keyword arguments passed to the underlying interpolator.
- Raises:
Exception – If method is unknown.
- interpolate_grid_neighbours(y, n_neighbors=None)[source]
Interpolate using precomputed grid neighbors.
Requires prior call to precompute_grid_neighbors.
- Parameters:
y (numpy.ndarray) – Training values, shape (n_samples,).
n_neighbors (int or None) – Number of neighbors to use. Defaults to self.n_neighbors.
- Returns:
Interpolated values at grid points.
- Return type:
numpy.ndarray
- Raises:
Exception – If requested neighbors exceeds precomputed neighbors.
- precompute_grid_neighbors(Xn, n_neighbors=100, n_proc=1)[source]
Precompute neighbors for a grid of query points.
Useful for repeated interpolation on the same grid with different training values (e.g., in MCMC sampling).
- Parameters:
Xn (numpy.ndarray) – Grid points, shape (n_grid, n_dims).
n_neighbors (int) – Number of neighbors to precompute. Defaults to 100.
n_proc (int) – Number of parallel processes. Defaults to 1.
- Raises:
AssertionError – If method is not ‘knn_balltree’.
- slice_linear_upsampling(n_repeat=1, n_neighbors=1)[source]
Upsample training data by adding midpoints between neighbors.
Experimental method to densify training data for better interpolation.
- Parameters:
n_repeat (int) – Number of upsampling iterations. Defaults to 1.
n_neighbors (int) – Number of neighbors to use for midpoint generation. Defaults to 1.
- class cosmic_toolbox.MultiInterp.Rbft(points, values, **kw_rbf)[source]
Bases:
objectRadial Basis Function interpolator with bounds checking.
Wraps scipy’s Rbf with automatic coordinate scaling and bounds checking. Points outside the training data bounds return -inf.
- Parameters:
points (numpy.ndarray) – Training points, shape (n_samples, n_dims).
values (numpy.ndarray) – Training values, shape (n_samples,).
kw_rbf – Additional keyword arguments passed to scipy.interpolate.Rbf.
- cosmic_toolbox.MultiInterp.predict_knn_balltree(Xi, X, y, n_neighbors, tree)[source]
Predict using k-nearest neighbors with BallTree.
- Parameters:
Xi (numpy.ndarray) – Query points, shape (n_points, n_dims).
X (numpy.ndarray) – Training points (unused, neighbors from tree).
y (numpy.ndarray) – Training values, shape (n_samples,).
n_neighbors (int) – Number of neighbors to use.
tree (sklearn.neighbors.BallTree) – BallTree instance for neighbor queries.
- Returns:
Predicted values, shape (n_points,).
- Return type:
numpy.ndarray
- cosmic_toolbox.MultiInterp.predict_knn_linear(Xi, X, y, n_neighbors, tree)[source]
Predict using local linear interpolation of nearest neighbors.
For each query point, finds k-nearest neighbors and fits a local linear interpolator using those neighbors.
- Parameters:
Xi (numpy.ndarray) – Query points, shape (n_points, n_dims).
X (numpy.ndarray) – Training points, shape (n_samples, n_dims).
y (numpy.ndarray) – Training values, shape (n_samples,).
n_neighbors (int) – Number of neighbors to use for local interpolation.
tree (sklearn.neighbors.BallTree) – BallTree instance for neighbor queries.
- Returns:
Predicted values, shape (n_points,). Out-of-hull points return -inf.
- Return type:
numpy.ndarray
- cosmic_toolbox.MultiInterp.predict_with_neighbours(y, ind, dist)[source]
Predict values using inverse-distance weighted average of neighbors.
- Parameters:
y (numpy.ndarray) – Training values, shape (n_samples,).
ind (numpy.ndarray) – Neighbor indices, shape (n_points, n_neighbors).
dist (numpy.ndarray) – Neighbor distances, shape (n_points, n_neighbors).
- Returns:
Predicted values, shape (n_points,).
- Return type:
numpy.ndarray
- cosmic_toolbox.MultiInterp.query_batch(X, tree, k=100, n_per_batch=10000)[source]
Query BallTree in batches to manage memory usage.
- Parameters:
X (numpy.ndarray) – Query points, shape (n_points, n_dims).
tree (sklearn.neighbors.BallTree) – BallTree instance to query.
k (int) – Number of nearest neighbors to find. Defaults to 100.
n_per_batch (int) – Number of points per batch. Defaults to 10000.
- Returns:
Tuple of (distances, indices) arrays.
- Return type:
tuple
- cosmic_toolbox.MultiInterp.query_split(X, tree, k, n_proc)[source]
Query BallTree in parallel using multiprocessing.
Splits the query points across multiple processes for parallel execution.
- Parameters:
X (numpy.ndarray) – Query points, shape (n_points, n_dims).
tree (sklearn.neighbors.BallTree) – BallTree instance to query.
k (int) – Number of nearest neighbors to find.
n_proc (int) – Number of parallel processes to use.
- Returns:
Tuple of (distances, indices) arrays.
- Return type:
tuple
cosmic_toolbox.NearestWeightedNDInterpolator module
Convenience interface to N-D interpolation.
Provides a weighted nearest-neighbor interpolator using BallTree for efficient N-dimensional interpolation.
author: Tomasz Kacprzak
- class cosmic_toolbox.NearestWeightedNDInterpolator.NearestWeightedNDInterpolator(x, y, k=None, tree_options=None)[source]
Bases:
NDInterpolatorBaseWeighted nearest-neighbor interpolation in N dimensions.
This interpolator uses BallTree for efficient nearest-neighbor queries and computes weighted averages based on inverse distance.
- Parameters:
x (numpy.ndarray) – Training points, shape (n_samples, n_dims).
y (numpy.ndarray) – Training values, shape (n_samples,).
k (int or None) – Number of nearest neighbors to use. Defaults to n_dims + 1 (number of vertices of an n_dims dimensional simplex).
tree_options (dict or None) – Options passed to sklearn’s BallTree constructor.
Example
>>> import numpy as np >>> from cosmic_toolbox import NearestWeightedNDInterpolator >>> x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) >>> y = np.array([0, 1, 1, 2]) >>> interp = NearestWeightedNDInterpolator(x, y) >>> interp(np.array([[0.5, 0.5]])) array([1.])
cosmic_toolbox.TransformedGaussianMixture module
Transformed Gaussian Mixture Model.
Provides a Gaussian Mixture Model that operates in a transformed parameter space to handle bounded parameters more effectively.
- class cosmic_toolbox.TransformedGaussianMixture.TransformedGaussianMixture(param_bounds=None, *args, **kwargs)[source]
Bases:
objectGaussian Mixture Model with parameter transformation.
This class wraps sklearn’s GaussianMixture to work with bounded parameters by transforming them to an unbounded space using a probit (normal CDF/PPF) transformation.
- Parameters:
param_bounds (numpy.ndarray or None) – Parameter bounds for each dimension, shape (n_dims, 2). If None, bounds are inferred from the data during fit.
args – Positional arguments passed to GaussianMixture.
kwargs – Keyword arguments passed to GaussianMixture.
- aic(X)[source]
Compute the Akaike Information Criterion.
- Parameters:
X (numpy.ndarray) – Data points, shape (n_samples, n_features).
- Returns:
AIC score.
- Return type:
float
- bic(X)[source]
Compute the Bayesian Information Criterion.
- Parameters:
X (numpy.ndarray) – Data points, shape (n_samples, n_features).
- Returns:
BIC score.
- Return type:
float
- fit(X, y=None)[source]
Fit the Gaussian Mixture Model.
- Parameters:
X (numpy.ndarray) – Training data, shape (n_samples, n_features).
y – Ignored (for sklearn compatibility).
- Returns:
self
- fit_predict(X, y=None)[source]
Fit and predict cluster labels.
- Parameters:
X (numpy.ndarray) – Training data, shape (n_samples, n_features).
y – Ignored (for sklearn compatibility).
- Returns:
Component labels for each sample.
- Return type:
numpy.ndarray
- get_params(deep=True)[source]
Get parameters of the underlying GaussianMixture.
- Parameters:
deep (bool) – If True, return parameters for sub-objects.
- Returns:
Parameter names mapped to their values.
- Return type:
dict
- predict_proba(X)[source]
Predict posterior probability of each component given the data.
- Parameters:
X (numpy.ndarray) – Data points, shape (n_samples, n_features).
- Returns:
Posterior probabilities, shape (n_samples, n_components).
- Return type:
numpy.ndarray
- sample(n_samples=1)[source]
Generate random samples from the fitted Gaussian mixture.
- Parameters:
n_samples (int) – Number of samples to generate.
- Returns:
Tuple of (samples, component_labels).
- Return type:
tuple(numpy.ndarray, numpy.ndarray)
- score(X, y=None)[source]
Compute the per-sample average log-likelihood.
- Parameters:
X (numpy.ndarray) – Data points, shape (n_samples, n_features).
y – Ignored (for sklearn compatibility).
- Returns:
Log-likelihood of the data.
- Return type:
float
- score_samples(X)[source]
Compute the log-likelihood of each sample.
- Parameters:
X (numpy.ndarray) – Data points, shape (n_samples, n_features).
- Returns:
Log-likelihood for each sample.
- Return type:
numpy.ndarray
- cosmic_toolbox.TransformedGaussianMixture.scale_fwd(x, param_bounds, param_bounds_trans)[source]
Scale values from original bounds to transformed bounds.
- Parameters:
x (numpy.ndarray) – Values to scale.
param_bounds (array-like) – Original parameter bounds [min, max].
param_bounds_trans (array-like) – Transformed parameter bounds [min, max].
- Returns:
Scaled values.
- Return type:
numpy.ndarray
- cosmic_toolbox.TransformedGaussianMixture.scale_inv(x, param_bounds, param_bounds_trans)[source]
Scale values from transformed bounds back to original bounds.
- Parameters:
x (numpy.ndarray) – Values to scale.
param_bounds (array-like) – Original parameter bounds [min, max].
param_bounds_trans (array-like) – Transformed parameter bounds [min, max].
- Returns:
Scaled values.
- Return type:
numpy.ndarray
- cosmic_toolbox.TransformedGaussianMixture.trans_fwd(x, param_bounds, param_bounds_trans)[source]
Transform values forward using normal PPF (probit transform).
- Parameters:
x (numpy.ndarray) – Values to transform.
param_bounds (array-like) – Original parameter bounds [min, max].
param_bounds_trans (array-like) – Transformed parameter bounds [min, max].
- Returns:
Transformed values.
- Return type:
numpy.ndarray
- cosmic_toolbox.TransformedGaussianMixture.trans_inv(x, param_bounds, param_bounds_trans)[source]
Transform values back using normal CDF (inverse probit transform).
- Parameters:
x (numpy.ndarray) – Values to transform.
param_bounds (array-like) – Original parameter bounds [min, max].
param_bounds_trans (array-like) – Transformed parameter bounds [min, max].
- Returns:
Inverse-transformed values.
- Return type:
numpy.ndarray
cosmic_toolbox.arraytools module
Array utilities for working with numpy structured arrays and HDF5 files.
Provides functions for: - Converting between arrays, recarrays, dicts, dataframes, and classes - Adding, removing, and manipulating columns in structured arrays - Reading and writing HDF5 files with various storage formats - Handling NaN/Inf values and dtype conversions
- cosmic_toolbox.arraytools.add_cols(rec, names, shapes=None, data=0, dtype=None)[source]
Add columns to a numpy recarray. By default, the new columns are filled with zeros. If data is a numpy array, it is used to fill the new columns. If each column should be filled with different data, data should be a list of numpy arrays or an array of shape (n_cols, n_rows).
- Parameters:
rec – numpy recarray
names – list of names for the columns
shapes – list of shapes for the columns
data – data to fill the columns with
dtype – dtype of the columns
- Returns:
numpy recarray
- cosmic_toolbox.arraytools.append_hdf(filename, arr, compression=None, **kwargs)[source]
Append structured array data to HDF5 file. Creates file if it doesn’t exist, appends if it does.
- cosmic_toolbox.arraytools.append_rows_to_h5dset(dset, array)[source]
Append rows to an existing HDF5 dataset.
- Parameters:
dset – h5py dataset (must be resizable)
array – numpy array to append
- cosmic_toolbox.arraytools.arr2rec(arr, names)[source]
Convert a numpy array to a numpy structured array.
- Parameters:
arr – numpy array
names – list of names for the columns
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.arr_to_rec(arr, dtype)[source]
Convert a numpy array to a numpy structured array given its dtype.
- Parameters:
arr – numpy array
dtype – dtype of the structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.check_hdf_column(filename, column_name)[source]
Check if a column exists in an HDF5 file or directory.
- Parameters:
filename – path to HDF5 file or directory
column_name – name of the column to check
- Returns:
True if column exists, False otherwise
- cosmic_toolbox.arraytools.class2dict(c)[source]
Convert a class to a dictionary.
- Parameters:
c – class
- Returns:
dictionary
- cosmic_toolbox.arraytools.class2rec(c)[source]
Convert a class to a numpy structured array.
- Parameters:
c – class
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.col_name_to_path(dirname, colname)[source]
Get the HDF5 file path for a column stored in a directory.
- Parameters:
dirname – directory path
colname – column name
- Returns:
full path to the column’s HDF5 file
- cosmic_toolbox.arraytools.delete_cols(rec, col_names)[source]
Delete columns from a numpy recarray.
- Parameters:
rec – numpy recarray
col_names – list of names of the columns to delete
- Returns:
numpy recarray
- cosmic_toolbox.arraytools.delete_columns(rec, col_names)[source]
Delete columns from a numpy recarray. (alias for delete_cols for backwards compatibility)
- Parameters:
rec – numpy recarray
col_names – list of names of the columns to delete
- cosmic_toolbox.arraytools.dict2class(d)[source]
Convert a dictionary to a class.
- Parameters:
d – dictionary
- Returns:
class
Example
>>> d = {'a': [1, 2, 3], 'b': 4} >>> c = dict2class(d) >>> c.a [1, 2, 3] >>> c.b 4
- cosmic_toolbox.arraytools.dict2rec(d)[source]
Convert a dictionary of arrays/lists/scalars to a numpy structured array.
- Parameters:
d – Dictionary with arrays/lists/scalars as values
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.ensure_cols(rec, names, shapes=None, data=0)[source]
Ensure columns exist in a recarray, adding them if missing.
- Parameters:
rec – numpy recarray
names – list of column names to ensure
shapes – list of shapes for the columns
data – data to fill new columns with
- Returns:
numpy recarray with ensured columns
- cosmic_toolbox.arraytools.get_dtype(columns, main='f8', shapes=None)[source]
Create a numpy dtype from column names.
Column names can include dtype specification as ‘name:dtype’.
- Parameters:
columns – list of column names (optionally with ‘:dtype’ suffix)
main – default dtype for columns without explicit dtype
shapes – list of shapes for each column
- Returns:
numpy dtype
- cosmic_toolbox.arraytools.get_dtype_of_list(lst)[source]
Get the dtype of all elements in a list (must be uniform).
- Parameters:
lst – list of arrays
- Returns:
numpy dtype
- Raises:
AssertionError – if not all elements have the same dtype
- cosmic_toolbox.arraytools.get_finite_mask(rec)[source]
Get a mask for finite rows (i.e., rows without NaNs or infs) in a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.get_hdf_col_names(path)[source]
Get column names from an HDF5 file or directory of HDF5 files.
- Parameters:
path – path to HDF5 file or directory
- Returns:
list of column names
- cosmic_toolbox.arraytools.get_inf_mask(rec)[source]
Get a mask for rows with infs in a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.get_loading_dtypes(dtype_list)[source]
Convert dtype list for loading from HDF5 (bytes to unicode).
- Parameters:
dtype_list – list of (name, dtype, …) tuples
- Returns:
list of Python-compatible dtype tuples
- cosmic_toolbox.arraytools.get_nan_mask(rec)[source]
Get a mask for rows with NaNs in a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.get_storing_dtypes(dtype_list)[source]
Convert dtype list for HDF5 storage (unicode to bytes).
- Parameters:
dtype_list – list of (name, dtype, …) tuples
- Returns:
list of storage-compatible dtype tuples
- cosmic_toolbox.arraytools.load_hdf(filename, first_row=-1, last_row=-1)[source]
Load a structured array from an HDF5 file’s ‘data’ dataset.
- Parameters:
filename – path to the HDF5 file
first_row – first row to load (-1 for beginning)
last_row – last row to load (-1 for end)
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.load_hdf_cols(filename, columns='all', first_row=0, last_row=None, verb=True, copy_local=True, filename_parent=None, allow_nonexisting=False, cols_to_add=(), selectors=None, copy_editor=None)[source]
Load columns from an HDF5 file or directory of HDF5 files.
Automatically detects whether filename is a file or directory.
- Parameters:
filename – path to HDF5 file or directory
columns – list of columns to load, or “all” for all columns
first_row – first row to load
last_row – last row to load (None for all rows)
verb – if True, print progress information
copy_local – if True, copy files locally before loading
filename_parent – parent path to search for missing columns
allow_nonexisting – if True, don’t fail if path doesn’t exist
cols_to_add – additional columns to add (initialized to 0)
selectors – dict of column-based selection functions
copy_editor – function to modify paths before copying
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.load_hdf_cols_from_directory(dirname, columns='all', first_row=0, last_row=-1, copy_local=False, dirname_parent=None, allow_nonexisting=False, cols_to_add=(), selectors=None, verb=True, copy_editor=None)[source]
Load columns stored as individual HDF5 files in a directory into one recarray.
- Parameters:
dirname – directory containing column HDF5 files
columns – list of columns to load, or “all” for all columns
first_row – first row to load
last_row – last row to load (-1 means last row minus 1)
copy_local – if True, copy files locally before loading
dirname_parent – parent directory to search for missing columns
allow_nonexisting – if True, don’t fail if directory doesn’t exist
cols_to_add – additional columns to add (initialized to 0)
selectors – dict of column-based selection functions
verb – if True, print progress information
copy_editor – function to modify paths before copying
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.load_hdf_cols_from_file(filename, columns='all', first_row=0, last_row=-1, cols_to_add=(), selectors=None, verb=True)[source]
Loads all columns of an hdf file into one recarray.
- Parameters:
filename – path to hdf file
columns – list of columns to load, “all” to load all columns
first_row – first row to load
last_row – last row to load
cols_to_add – list of columns to add to the recarray
selectors – dictionary of selection masks for columns
verb – if True, print information
- Returns:
recarray
- cosmic_toolbox.arraytools.nanequal(a, b)[source]
Element-wise equality comparison that treats NaN == NaN as True.
- Parameters:
a – first array
b – second array
- Returns:
boolean array
- cosmic_toolbox.arraytools.new_array(n_rows, columns, ints=None, float_dtype=<class 'numpy.float64'>, int_dtype=<class 'numpy.int64'>)[source]
Create a new structured array with specified columns.
- Parameters:
n_rows – number of rows
columns – list of column names
ints – list of column names that should be integers
float_dtype – dtype for float columns
int_dtype – dtype for integer columns
- Returns:
numpy structured array initialized with zeros
- cosmic_toolbox.arraytools.overwrite_hdf5_column(path, name, data, **kwargs)[source]
Overwrite a column in an existing HDF5 file.
- Parameters:
path – path to HDF5 file or directory
name – column/dataset name
data – new data to write
kwargs – additional arguments for create_dataset
- cosmic_toolbox.arraytools.pd2rec(df)[source]
Convert a pandas dataframe to a numpy structured array.
- Parameters:
df – pandas dataframe
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.rec2arr(rec, return_names=False)[source]
Convert a numpy structured array to a numpy array.
- Parameters:
rec – numpy structured array
return_names – if True, also return the names of the columns
- Returns:
numpy array
Example
>>> rec = np.array([(1, 4), (2, 4), (3, 4)], dtype=[('a', '<i8'), ('b', '<i8')]) >>> arr = rec2arr(rec) >>> arr array([[1, 4], [2, 4], [3, 4]]) >>> arr, names = rec2arr(rec, return_names=True) >>> arr array([[1, 4], [2, 4], [3, 4]]) >>> names ['a', 'b']
- cosmic_toolbox.arraytools.rec2class(rec)[source]
Convert a numpy structured array to a class.
- Parameters:
rec – numpy structured array
- Returns:
class
- cosmic_toolbox.arraytools.rec2dict(rec)[source]
Convert a numpy structured array to a dictionary.
- Parameters:
rec – numpy structured array
- Returns:
dictionary
Example
>>> rec = np.array([(1, 4), (2, 4), (3, 4)], dtype=[('a', '<i8'), ('b', '<i8')]) >>> d = rec2dict(rec) >>> d {'a': array([1, 2, 3]), 'b': array([4, 4, 4])}
- cosmic_toolbox.arraytools.rec2pd(rec)[source]
Convert a numpy structured array to a pandas DataFrame.
Multi-dimensional columns are flattened with suffix _0, _1, etc.
- Parameters:
rec – numpy structured array
- Returns:
pandas DataFrame
- cosmic_toolbox.arraytools.rec_float64_to_float32(cat)[source]
Convert float64 columns in a structured array to float32.
- Parameters:
cat – numpy structured array
- Returns:
structured array with float32 instead of float64
- cosmic_toolbox.arraytools.remove_infs(rec)[source]
Remove rows with infs from a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.remove_nans(rec)[source]
Remove rows with NaNs from a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.replace_hdf5_dataset(fobj, name, data, **kwargs)[source]
Replace or create a dataset in an HDF5 file.
- Parameters:
fobj – h5py File object
name – dataset name
data – data to write
kwargs – additional arguments for create_dataset
- cosmic_toolbox.arraytools.save_dict_to_hdf5(filename, data_dict, kw_compress=None)[source]
Save a nested dictionary to HDF5 with groups and datasets.
- Parameters:
filename – path to the HDF5 file
data_dict – nested dict {group_name: {dataset_name: data}}
kw_compress – compression kwargs (default: lzf with shuffle)
- cosmic_toolbox.arraytools.save_hdf(filename, arr, **kwargs)[source]
Save a structured array to an HDF5 file as a single ‘data’ dataset.
- Parameters:
filename – path to the HDF5 file
arr – numpy structured array to save
kwargs – additional arguments passed to h5py.create_dataset
- cosmic_toolbox.arraytools.save_hdf_cols(filename, arr, compression=None, resizable=False, suppress_log=False)[source]
Save a structured array to HDF5 with each column as a separate dataset.
- Parameters:
filename – path to the HDF5 file
arr – numpy structured array to save
compression – compression method (e.g., ‘lzf’, ‘gzip’) or dict
resizable – if True, create resizable datasets
suppress_log – if True, log at debug level instead of info
- cosmic_toolbox.arraytools.select_finite(rec)[source]
Remove rows with NaNs or infs from a numpy structured array.
- Parameters:
rec – numpy structured array
- Returns:
numpy structured array
- cosmic_toolbox.arraytools.set_loading_dtypes(arr)[source]
Convert array dtypes after loading from HDF5 (bytes to unicode strings).
- Parameters:
arr – numpy array, bytes, list, or scalar
- Returns:
Python-compatible version of input
- cosmic_toolbox.arraytools.set_storing_dtypes(arr)[source]
Convert array dtypes for HDF5 storage (unicode strings to bytes).
- Parameters:
arr – numpy array, string, list, or scalar
- Returns:
storage-compatible version of input
- cosmic_toolbox.arraytools.view_fields(rec, names)[source]
rec must be a numpy structured array. names is the collection of field names to keep.
Returns a view of the array a (not a copy).
- cosmic_toolbox.arraytools.write_to_hdf(filename, arr, name='data', compression='lzf', shuffle=True, **kwargs)[source]
Write a recarray to a hdf file.
- Parameters:
filename – filename of the hdf file
arr – numpy recarray
name – name of the dataset
compression – compression method
shuffle – shuffle data before compression
kwargs – keyword arguments for h5py.File.create_dataset
cosmic_toolbox.colors module
Color utilities for matplotlib plots.
Provides custom color cycles and utilities for setting matplotlib color schemes.
- cosmic_toolbox.colors.get_colors(cycle='silvan')[source]
Get a dictionary of named colors for a given color cycle.
- Parameters:
cycle (str) – Name of the color cycle to use. Currently only “silvan” is supported. If a different value is passed, it is returned as-is.
- Returns:
Dictionary mapping color names to hex color codes, or the input value if not a recognized cycle name.
- Return type:
dict or any
cosmic_toolbox.copy_guardian module
Copy Guardian - Rate-limited file copying with semaphore-based concurrency control.
Provides utilities for copying files locally and remotely with controlled concurrency to avoid overloading network resources.
@author: Joerg Herbel
- class cosmic_toolbox.copy_guardian.CopyGuardian(n_max_connect, n_max_attempts_remote, time_between_attempts, use_copyfile=False)[source]
Bases:
objectRate-limited file copier with semaphore-based concurrency control.
This class manages file copying operations with controlled concurrency using file-based semaphores. It supports both local and remote (rsync) copy operations.
- Parameters:
n_max_connect (int) – Maximum number of simultaneous connections allowed.
n_max_attempts_remote (int) – Maximum number of retry attempts for remote copies.
time_between_attempts (float) – Time in seconds to wait between retry attempts.
use_copyfile (bool) – If True, use shutil.copyfile instead of shutil.copy for local file copies (preserves no metadata).
cosmic_toolbox.file_utils module
File utilities for reading, writing, and copying files.
Provides functions for: - Reading/writing pickle and HDF5 files with compression - Robust file/directory operations (makedirs, remove, copy) - Remote file operations via SSH/rsync - YAML file handling
- cosmic_toolbox.file_utils.copy_with_copy_guardian(sources, destination, n_max_connect=10, timeout=1000, folder_with_many_files=False)[source]
Copy files/directories using the CopyGuardian.
- Parameters:
sources – List of source files/directories.
destination – Destination directory.
n_max_connect – Maximum number of simultaneous connections.
timeout – time in seconds to wait for a connection to become available
folder_with_many_files – If True, the source is a folder with many files
- cosmic_toolbox.file_utils.ensure_permissions(path, verb=False)[source]
Set file permissions to user rwx, group rx, others rx.
- Parameters:
path – path to file or directory
verb – if True, log the permission change
- cosmic_toolbox.file_utils.get_abs_path(path)[source]
Get the absolute path, handling remote paths and environment variables.
- Parameters:
path – relative or absolute path (can be Path object)
- Returns:
absolute path string
- cosmic_toolbox.file_utils.is_remote(path)[source]
Check if a path is a remote path (user@host:/path format).
- Parameters:
path – path to check (can be Path object)
- Returns:
True if remote, False otherwise
- cosmic_toolbox.file_utils.load_from_hdf5(file_name, hdf5_keys, hdf5_path='')[source]
Load data stored in a HDF5-file. :param file_name: Name of the file. :param hdf5_keys: Keys of arrays to be loaded. :param hdf5_path: Path within HDF5-file appended to all keys. :return: Loaded arrays.
- cosmic_toolbox.file_utils.read_from_hdf(filepath, name='data')[source]
Read an object from an hdf5 file.
- Parameters:
filepath – Path to the hdf5 file.
name – Name of the dataset.
- Returns:
Object read from the hdf5 file.
- cosmic_toolbox.file_utils.read_from_pickle(filepath, compression='none')[source]
Read an object from a pickle file.
- Parameters:
filepath – Path to the pickle file.
compression – Compression method to use. Can be “none”, “lzf” or “bz2”.
- Returns:
Object read from the pickle file.
- cosmic_toolbox.file_utils.read_yaml(filename)[source]
Read a YAML file.
- Parameters:
filename – path to YAML file
- Returns:
parsed YAML content
- cosmic_toolbox.file_utils.robust_copy(src, dst, n_max_connect=50, method='CopyGuardian', folder_with_many_files=False, **kwargs)[source]
Copy files/directories using the specified method.
- Parameters:
src – Source file/directory.
dst – Destination file/directory.
n_max_connect – Maximum number of simultaneous connections.
method – Method to use for copying. Can be “CopyGuardian” or “system_cp”.
folder_with_many_files – If True, the source is a folder with many files (only for CopyGuardian).
kwargs – Additional arguments passed to the copy method.
- cosmic_toolbox.file_utils.robust_makedirs(path)[source]
Create directories, handling remote paths via SSH.
- Parameters:
path – path to create (can be remote with user@host:path format)
- cosmic_toolbox.file_utils.robust_remove(path)[source]
Remove a file or directory.
- Parameters:
path – Path to the file or directory.
- cosmic_toolbox.file_utils.system_copy(sources, dest, args_str_cp='')[source]
Copy files using the system cp command.
- Parameters:
sources – list of source paths
dest – destination path
args_str_cp – additional arguments for cp command
cosmic_toolbox.logger module
Logging utilities with colored output and progress bars.
Provides a customized logger with: - Color-coded log levels (debug=violet, warning=orange, error=red) - Progress bar integration via tqdm - Environment variable control (PYTHON_LOGGER_LEVEL)
- class cosmic_toolbox.logger.ColorFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]
Bases:
FormatterCustom log formatter with color-coded output based on log level.
Colors: - DEBUG: violet - INFO: default - WARNING: bold orange - ERROR/CRITICAL: bold red
- BOLD = '\x1b[1m'
- FORMATS = {10: '\x1b[95m%(asctime)s %(name)10s %(levelname).3s %(message)s \x1b[0m', 20: '%(asctime)s %(name)10s %(levelname).3s %(message)s ', 30: '\x1b[1m\x1b[33m%(asctime)s %(name)10s %(levelname).3s %(message)s \x1b[0m', 40: '\x1b[1m\x1b[91m%(asctime)s %(name)10s %(levelname).3s %(message)s \x1b[0m', 50: '\x1b[1m\x1b[91m%(asctime)s %(name)10s %(levelname).3s %(message)s \x1b[0m'}
- ORANGE = '\x1b[33m'
- RED = '\x1b[91m'
- UNDERLINE = '\x1b[4m'
- VIOLET = '\x1b[95m'
- YELLOW = '\x1b[93m'
- format(record)[source]
Format the log record with appropriate colors.
- Parameters:
record – log record to format
- Returns:
formatted string with ANSI color codes
- reset = '\x1b[0m'
- class cosmic_toolbox.logger.Progressbar(logger=None)[source]
Bases:
objectProgress bar wrapper for use with logger.
Wraps tqdm with sensible defaults and respects the logger’s level.
- cosmic_toolbox.logger.get_logger(filepath, logging_level=None)[source]
Get a configured logger with colored output and progress bar support.
If logging_level is unspecified, uses the PYTHON_LOGGER_LEVEL environment variable, defaulting to ‘info’.
- Parameters:
filepath – name of the file calling the logger (used for logger name)
logging_level – logging level (‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’)
- Returns:
configured logger object with progressbar attribute
cosmic_toolbox.utils module
General utilities for argument parsing, multiprocessing, and convenience functions.
Provides functions for: - Converting argument strings to dictionaries - Parsing command-line sequences (lists, tuples) - Running functions in parallel with multiprocessing - Miscellaneous helper functions
- cosmic_toolbox.utils.arg_str_to_dict(arg_str)[source]
Converts a string in the format “{arg:value}” or “{arg1:value1,arg2:value2,…}” to a dictionary with keys and values. Note: strings should not contain ‘ or “.
- Parameters:
arg_str – A string in the format “{arg:value}” or “{arg1:value1, arg2:value2,…}”.
- Return arg_dict:
dictionary with keys and values corresponding to the input string.
- cosmic_toolbox.utils.is_between(x, min, max)[source]
Checks if x is between min and max.
- Parameters:
x – Value to check.
min – Minimum value.
max – Maximum value.
- Returns:
True if x is between min and max, False otherwise.
- cosmic_toolbox.utils.parse_list(s)[source]
Parses a string to a list for argparse. Can be used as type for argparse.
- Parameters:
s – String to parse.
- Returns:
list.
- cosmic_toolbox.utils.parse_sequence(s)[source]
Parses a string to a list/tuple for argparse. Can be used as type for argparse.
- Parameters:
s – String to parse.
- Returns:
tuple or list.
- Raises:
argparse.ArgumentTypeError – If the string cannot be parsed to a tuple.
- cosmic_toolbox.utils.random_sleep(max_seconds=0, min_seconds=0)[source]
Sleeps for a random amount of time between min_seconds and max_seconds.
- Parameters:
max_seconds – Maximum number of seconds to sleep.
min_seconds – Minimum number of seconds to sleep.
- cosmic_toolbox.utils.run_imap_multiprocessing(func, argument_list, num_processes, verb=True)[source]
Runs a function with a list of arguments in parallel using multiprocessing.
- Parameters:
func – Function to run.
argument_list – List of arguments to run the function with.
num_processes – Number of processes to use.
verb – If True, show progress bar.
- Returns:
List of results from the function.