cosmic_toolbox package

Subpackages

Submodules

cosmic_toolbox.MultiInterp module

Multi-method interpolation framework.

Provides flexible N-dimensional interpolation with multiple backend methods including nearest neighbors, radial basis functions, linear interpolation, and machine learning regressors.

author: Tomasz Kacprzak

class cosmic_toolbox.MultiInterp.MultiInterp(X, y, method='nn', **kw)[source]

Bases: object

Multi-method N-dimensional interpolator.

Provides a unified interface to multiple interpolation methods with automatic coordinate scaling and bounds checking.

Parameters:
  • X (numpy.ndarray) – Training points, shape (n_samples, n_dims).

  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • method (str) – Interpolation method. Options: - ‘nn’: Nearest neighbor (scipy NearestNDInterpolator) - ‘linear’: Linear interpolation (scipy LinearNDInterpolator) - ‘rbf’: Radial basis function (scipy Rbf) - ‘rbft’: Radial basis function with bounds (Rbft class) - ‘knn_regression’: K-neighbors regression (sklearn) - ‘rnn_regression’: Radius neighbors regression (sklearn) - ‘knn_linear’: Local linear interpolation - ‘knn_balltree’: K-neighbors with BallTree - ‘random_forest’: Random forest regressor (sklearn)

  • kw – Additional keyword arguments passed to the underlying interpolator.

Example

>>> import numpy as np
>>> from cosmic_toolbox.MultiInterp import MultiInterp
>>> X = np.random.rand(100, 2)
>>> y = np.sin(X[:, 0]) + np.cos(X[:, 1])
>>> interp = MultiInterp(X, y, method='nn')
>>> Xi = np.random.rand(10, 2)
>>> yi = interp(Xi)
init_interp(**kw)[source]

Initialize the underlying interpolator based on method.

Parameters:

kw – Keyword arguments passed to the underlying interpolator.

Raises:

Exception – If method is unknown.

interpolate_grid_neighbours(y, n_neighbors=None)[source]

Interpolate using precomputed grid neighbors.

Requires prior call to precompute_grid_neighbors.

Parameters:
  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • n_neighbors (int or None) – Number of neighbors to use. Defaults to self.n_neighbors.

Returns:

Interpolated values at grid points.

Return type:

numpy.ndarray

Raises:

Exception – If requested neighbors exceeds precomputed neighbors.

precompute_grid_neighbors(Xn, n_neighbors=100, n_proc=1)[source]

Precompute neighbors for a grid of query points.

Useful for repeated interpolation on the same grid with different training values (e.g., in MCMC sampling).

Parameters:
  • Xn (numpy.ndarray) – Grid points, shape (n_grid, n_dims).

  • n_neighbors (int) – Number of neighbors to precompute. Defaults to 100.

  • n_proc (int) – Number of parallel processes. Defaults to 1.

Raises:

AssertionError – If method is not ‘knn_balltree’.

slice_linear_upsampling(n_repeat=1, n_neighbors=1)[source]

Upsample training data by adding midpoints between neighbors.

Experimental method to densify training data for better interpolation.

Parameters:
  • n_repeat (int) – Number of upsampling iterations. Defaults to 1.

  • n_neighbors (int) – Number of neighbors to use for midpoint generation. Defaults to 1.

class cosmic_toolbox.MultiInterp.Rbft(points, values, **kw_rbf)[source]

Bases: object

Radial Basis Function interpolator with bounds checking.

Wraps scipy’s Rbf with automatic coordinate scaling and bounds checking. Points outside the training data bounds return -inf.

Parameters:
  • points (numpy.ndarray) – Training points, shape (n_samples, n_dims).

  • values (numpy.ndarray) – Training values, shape (n_samples,).

  • kw_rbf – Additional keyword arguments passed to scipy.interpolate.Rbf.

cosmic_toolbox.MultiInterp.predict_knn_balltree(Xi, X, y, n_neighbors, tree)[source]

Predict using k-nearest neighbors with BallTree.

Parameters:
  • Xi (numpy.ndarray) – Query points, shape (n_points, n_dims).

  • X (numpy.ndarray) – Training points (unused, neighbors from tree).

  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • n_neighbors (int) – Number of neighbors to use.

  • tree (sklearn.neighbors.BallTree) – BallTree instance for neighbor queries.

Returns:

Predicted values, shape (n_points,).

Return type:

numpy.ndarray

cosmic_toolbox.MultiInterp.predict_knn_linear(Xi, X, y, n_neighbors, tree)[source]

Predict using local linear interpolation of nearest neighbors.

For each query point, finds k-nearest neighbors and fits a local linear interpolator using those neighbors.

Parameters:
  • Xi (numpy.ndarray) – Query points, shape (n_points, n_dims).

  • X (numpy.ndarray) – Training points, shape (n_samples, n_dims).

  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • n_neighbors (int) – Number of neighbors to use for local interpolation.

  • tree (sklearn.neighbors.BallTree) – BallTree instance for neighbor queries.

Returns:

Predicted values, shape (n_points,). Out-of-hull points return -inf.

Return type:

numpy.ndarray

cosmic_toolbox.MultiInterp.predict_with_neighbours(y, ind, dist)[source]

Predict values using inverse-distance weighted average of neighbors.

Parameters:
  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • ind (numpy.ndarray) – Neighbor indices, shape (n_points, n_neighbors).

  • dist (numpy.ndarray) – Neighbor distances, shape (n_points, n_neighbors).

Returns:

Predicted values, shape (n_points,).

Return type:

numpy.ndarray

cosmic_toolbox.MultiInterp.query_batch(X, tree, k=100, n_per_batch=10000)[source]

Query BallTree in batches to manage memory usage.

Parameters:
  • X (numpy.ndarray) – Query points, shape (n_points, n_dims).

  • tree (sklearn.neighbors.BallTree) – BallTree instance to query.

  • k (int) – Number of nearest neighbors to find. Defaults to 100.

  • n_per_batch (int) – Number of points per batch. Defaults to 10000.

Returns:

Tuple of (distances, indices) arrays.

Return type:

tuple

cosmic_toolbox.MultiInterp.query_split(X, tree, k, n_proc)[source]

Query BallTree in parallel using multiprocessing.

Splits the query points across multiple processes for parallel execution.

Parameters:
  • X (numpy.ndarray) – Query points, shape (n_points, n_dims).

  • tree (sklearn.neighbors.BallTree) – BallTree instance to query.

  • k (int) – Number of nearest neighbors to find.

  • n_proc (int) – Number of parallel processes to use.

Returns:

Tuple of (distances, indices) arrays.

Return type:

tuple

cosmic_toolbox.NearestWeightedNDInterpolator module

Convenience interface to N-D interpolation.

Provides a weighted nearest-neighbor interpolator using BallTree for efficient N-dimensional interpolation.

author: Tomasz Kacprzak

class cosmic_toolbox.NearestWeightedNDInterpolator.NearestWeightedNDInterpolator(x, y, k=None, tree_options=None)[source]

Bases: NDInterpolatorBase

Weighted nearest-neighbor interpolation in N dimensions.

This interpolator uses BallTree for efficient nearest-neighbor queries and computes weighted averages based on inverse distance.

Parameters:
  • x (numpy.ndarray) – Training points, shape (n_samples, n_dims).

  • y (numpy.ndarray) – Training values, shape (n_samples,).

  • k (int or None) – Number of nearest neighbors to use. Defaults to n_dims + 1 (number of vertices of an n_dims dimensional simplex).

  • tree_options (dict or None) – Options passed to sklearn’s BallTree constructor.

Example

>>> import numpy as np
>>> from cosmic_toolbox import NearestWeightedNDInterpolator
>>> x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
>>> y = np.array([0, 1, 1, 2])
>>> interp = NearestWeightedNDInterpolator(x, y)
>>> interp(np.array([[0.5, 0.5]]))
array([1.])

cosmic_toolbox.TransformedGaussianMixture module

Transformed Gaussian Mixture Model.

Provides a Gaussian Mixture Model that operates in a transformed parameter space to handle bounded parameters more effectively.

class cosmic_toolbox.TransformedGaussianMixture.TransformedGaussianMixture(param_bounds=None, *args, **kwargs)[source]

Bases: object

Gaussian Mixture Model with parameter transformation.

This class wraps sklearn’s GaussianMixture to work with bounded parameters by transforming them to an unbounded space using a probit (normal CDF/PPF) transformation.

Parameters:
  • param_bounds (numpy.ndarray or None) – Parameter bounds for each dimension, shape (n_dims, 2). If None, bounds are inferred from the data during fit.

  • args – Positional arguments passed to GaussianMixture.

  • kwargs – Keyword arguments passed to GaussianMixture.

aic(X)[source]

Compute the Akaike Information Criterion.

Parameters:

X (numpy.ndarray) – Data points, shape (n_samples, n_features).

Returns:

AIC score.

Return type:

float

bic(X)[source]

Compute the Bayesian Information Criterion.

Parameters:

X (numpy.ndarray) – Data points, shape (n_samples, n_features).

Returns:

BIC score.

Return type:

float

fit(X, y=None)[source]

Fit the Gaussian Mixture Model.

Parameters:
  • X (numpy.ndarray) – Training data, shape (n_samples, n_features).

  • y – Ignored (for sklearn compatibility).

Returns:

self

fit_predict(X, y=None)[source]

Fit and predict cluster labels.

Parameters:
  • X (numpy.ndarray) – Training data, shape (n_samples, n_features).

  • y – Ignored (for sklearn compatibility).

Returns:

Component labels for each sample.

Return type:

numpy.ndarray

get_params(deep=True)[source]

Get parameters of the underlying GaussianMixture.

Parameters:

deep (bool) – If True, return parameters for sub-objects.

Returns:

Parameter names mapped to their values.

Return type:

dict

predict_proba(X)[source]

Predict posterior probability of each component given the data.

Parameters:

X (numpy.ndarray) – Data points, shape (n_samples, n_features).

Returns:

Posterior probabilities, shape (n_samples, n_components).

Return type:

numpy.ndarray

sample(n_samples=1)[source]

Generate random samples from the fitted Gaussian mixture.

Parameters:

n_samples (int) – Number of samples to generate.

Returns:

Tuple of (samples, component_labels).

Return type:

tuple(numpy.ndarray, numpy.ndarray)

score(X, y=None)[source]

Compute the per-sample average log-likelihood.

Parameters:
  • X (numpy.ndarray) – Data points, shape (n_samples, n_features).

  • y – Ignored (for sklearn compatibility).

Returns:

Log-likelihood of the data.

Return type:

float

score_samples(X)[source]

Compute the log-likelihood of each sample.

Parameters:

X (numpy.ndarray) – Data points, shape (n_samples, n_features).

Returns:

Log-likelihood for each sample.

Return type:

numpy.ndarray

set_bounds(X)[source]

Set parameter bounds from data if not already set.

Parameters:

X (numpy.ndarray) – Training data, shape (n_samples, n_features).

set_params(**params)[source]

Set parameters of the underlying GaussianMixture.

Parameters:

params – Parameters to set.

Returns:

self

cosmic_toolbox.TransformedGaussianMixture.scale_fwd(x, param_bounds, param_bounds_trans)[source]

Scale values from original bounds to transformed bounds.

Parameters:
  • x (numpy.ndarray) – Values to scale.

  • param_bounds (array-like) – Original parameter bounds [min, max].

  • param_bounds_trans (array-like) – Transformed parameter bounds [min, max].

Returns:

Scaled values.

Return type:

numpy.ndarray

cosmic_toolbox.TransformedGaussianMixture.scale_inv(x, param_bounds, param_bounds_trans)[source]

Scale values from transformed bounds back to original bounds.

Parameters:
  • x (numpy.ndarray) – Values to scale.

  • param_bounds (array-like) – Original parameter bounds [min, max].

  • param_bounds_trans (array-like) – Transformed parameter bounds [min, max].

Returns:

Scaled values.

Return type:

numpy.ndarray

cosmic_toolbox.TransformedGaussianMixture.trans_fwd(x, param_bounds, param_bounds_trans)[source]

Transform values forward using normal PPF (probit transform).

Parameters:
  • x (numpy.ndarray) – Values to transform.

  • param_bounds (array-like) – Original parameter bounds [min, max].

  • param_bounds_trans (array-like) – Transformed parameter bounds [min, max].

Returns:

Transformed values.

Return type:

numpy.ndarray

cosmic_toolbox.TransformedGaussianMixture.trans_inv(x, param_bounds, param_bounds_trans)[source]

Transform values back using normal CDF (inverse probit transform).

Parameters:
  • x (numpy.ndarray) – Values to transform.

  • param_bounds (array-like) – Original parameter bounds [min, max].

  • param_bounds_trans (array-like) – Transformed parameter bounds [min, max].

Returns:

Inverse-transformed values.

Return type:

numpy.ndarray

cosmic_toolbox.arraytools module

Array utilities for working with numpy structured arrays and HDF5 files.

Provides functions for: - Converting between arrays, recarrays, dicts, dataframes, and classes - Adding, removing, and manipulating columns in structured arrays - Reading and writing HDF5 files with various storage formats - Handling NaN/Inf values and dtype conversions

cosmic_toolbox.arraytools.add_cols(rec, names, shapes=None, data=0, dtype=None)[source]

Add columns to a numpy recarray. By default, the new columns are filled with zeros. If data is a numpy array, it is used to fill the new columns. If each column should be filled with different data, data should be a list of numpy arrays or an array of shape (n_cols, n_rows).

Parameters:
  • rec – numpy recarray

  • names – list of names for the columns

  • shapes – list of shapes for the columns

  • data – data to fill the columns with

  • dtype – dtype of the columns

Returns:

numpy recarray

cosmic_toolbox.arraytools.append_hdf(filename, arr, compression=None, **kwargs)[source]

Append structured array data to HDF5 file. Creates file if it doesn’t exist, appends if it does.

cosmic_toolbox.arraytools.append_rows_to_h5dset(dset, array)[source]

Append rows to an existing HDF5 dataset.

Parameters:
  • dset – h5py dataset (must be resizable)

  • array – numpy array to append

cosmic_toolbox.arraytools.arr2rec(arr, names)[source]

Convert a numpy array to a numpy structured array.

Parameters:
  • arr – numpy array

  • names – list of names for the columns

Returns:

numpy structured array

cosmic_toolbox.arraytools.arr_to_rec(arr, dtype)[source]

Convert a numpy array to a numpy structured array given its dtype.

Parameters:
  • arr – numpy array

  • dtype – dtype of the structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.check_hdf_column(filename, column_name)[source]

Check if a column exists in an HDF5 file or directory.

Parameters:
  • filename – path to HDF5 file or directory

  • column_name – name of the column to check

Returns:

True if column exists, False otherwise

cosmic_toolbox.arraytools.class2dict(c)[source]

Convert a class to a dictionary.

Parameters:

c – class

Returns:

dictionary

cosmic_toolbox.arraytools.class2rec(c)[source]

Convert a class to a numpy structured array.

Parameters:

c – class

Returns:

numpy structured array

cosmic_toolbox.arraytools.col_name_to_path(dirname, colname)[source]

Get the HDF5 file path for a column stored in a directory.

Parameters:
  • dirname – directory path

  • colname – column name

Returns:

full path to the column’s HDF5 file

cosmic_toolbox.arraytools.delete_cols(rec, col_names)[source]

Delete columns from a numpy recarray.

Parameters:
  • rec – numpy recarray

  • col_names – list of names of the columns to delete

Returns:

numpy recarray

cosmic_toolbox.arraytools.delete_columns(rec, col_names)[source]

Delete columns from a numpy recarray. (alias for delete_cols for backwards compatibility)

Parameters:
  • rec – numpy recarray

  • col_names – list of names of the columns to delete

cosmic_toolbox.arraytools.dict2class(d)[source]

Convert a dictionary to a class.

Parameters:

d – dictionary

Returns:

class

Example

>>> d = {'a': [1, 2, 3], 'b': 4}
>>> c = dict2class(d)
>>> c.a
[1, 2, 3]
>>> c.b
4
cosmic_toolbox.arraytools.dict2rec(d)[source]

Convert a dictionary of arrays/lists/scalars to a numpy structured array.

Parameters:

d – Dictionary with arrays/lists/scalars as values

Returns:

numpy structured array

cosmic_toolbox.arraytools.ensure_cols(rec, names, shapes=None, data=0)[source]

Ensure columns exist in a recarray, adding them if missing.

Parameters:
  • rec – numpy recarray

  • names – list of column names to ensure

  • shapes – list of shapes for the columns

  • data – data to fill new columns with

Returns:

numpy recarray with ensured columns

cosmic_toolbox.arraytools.get_dtype(columns, main='f8', shapes=None)[source]

Create a numpy dtype from column names.

Column names can include dtype specification as ‘name:dtype’.

Parameters:
  • columns – list of column names (optionally with ‘:dtype’ suffix)

  • main – default dtype for columns without explicit dtype

  • shapes – list of shapes for each column

Returns:

numpy dtype

cosmic_toolbox.arraytools.get_dtype_of_list(lst)[source]

Get the dtype of all elements in a list (must be uniform).

Parameters:

lst – list of arrays

Returns:

numpy dtype

Raises:

AssertionError – if not all elements have the same dtype

cosmic_toolbox.arraytools.get_finite_mask(rec)[source]

Get a mask for finite rows (i.e., rows without NaNs or infs) in a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.get_hdf_col_names(path)[source]

Get column names from an HDF5 file or directory of HDF5 files.

Parameters:

path – path to HDF5 file or directory

Returns:

list of column names

cosmic_toolbox.arraytools.get_inf_mask(rec)[source]

Get a mask for rows with infs in a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.get_loading_dtypes(dtype_list)[source]

Convert dtype list for loading from HDF5 (bytes to unicode).

Parameters:

dtype_list – list of (name, dtype, …) tuples

Returns:

list of Python-compatible dtype tuples

cosmic_toolbox.arraytools.get_nan_mask(rec)[source]

Get a mask for rows with NaNs in a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.get_storing_dtypes(dtype_list)[source]

Convert dtype list for HDF5 storage (unicode to bytes).

Parameters:

dtype_list – list of (name, dtype, …) tuples

Returns:

list of storage-compatible dtype tuples

cosmic_toolbox.arraytools.load_hdf(filename, first_row=-1, last_row=-1)[source]

Load a structured array from an HDF5 file’s ‘data’ dataset.

Parameters:
  • filename – path to the HDF5 file

  • first_row – first row to load (-1 for beginning)

  • last_row – last row to load (-1 for end)

Returns:

numpy structured array

cosmic_toolbox.arraytools.load_hdf_cols(filename, columns='all', first_row=0, last_row=None, verb=True, copy_local=True, filename_parent=None, allow_nonexisting=False, cols_to_add=(), selectors=None, copy_editor=None)[source]

Load columns from an HDF5 file or directory of HDF5 files.

Automatically detects whether filename is a file or directory.

Parameters:
  • filename – path to HDF5 file or directory

  • columns – list of columns to load, or “all” for all columns

  • first_row – first row to load

  • last_row – last row to load (None for all rows)

  • verb – if True, print progress information

  • copy_local – if True, copy files locally before loading

  • filename_parent – parent path to search for missing columns

  • allow_nonexisting – if True, don’t fail if path doesn’t exist

  • cols_to_add – additional columns to add (initialized to 0)

  • selectors – dict of column-based selection functions

  • copy_editor – function to modify paths before copying

Returns:

numpy structured array

cosmic_toolbox.arraytools.load_hdf_cols_from_directory(dirname, columns='all', first_row=0, last_row=-1, copy_local=False, dirname_parent=None, allow_nonexisting=False, cols_to_add=(), selectors=None, verb=True, copy_editor=None)[source]

Load columns stored as individual HDF5 files in a directory into one recarray.

Parameters:
  • dirname – directory containing column HDF5 files

  • columns – list of columns to load, or “all” for all columns

  • first_row – first row to load

  • last_row – last row to load (-1 means last row minus 1)

  • copy_local – if True, copy files locally before loading

  • dirname_parent – parent directory to search for missing columns

  • allow_nonexisting – if True, don’t fail if directory doesn’t exist

  • cols_to_add – additional columns to add (initialized to 0)

  • selectors – dict of column-based selection functions

  • verb – if True, print progress information

  • copy_editor – function to modify paths before copying

Returns:

numpy structured array

cosmic_toolbox.arraytools.load_hdf_cols_from_file(filename, columns='all', first_row=0, last_row=-1, cols_to_add=(), selectors=None, verb=True)[source]

Loads all columns of an hdf file into one recarray.

Parameters:
  • filename – path to hdf file

  • columns – list of columns to load, “all” to load all columns

  • first_row – first row to load

  • last_row – last row to load

  • cols_to_add – list of columns to add to the recarray

  • selectors – dictionary of selection masks for columns

  • verb – if True, print information

Returns:

recarray

cosmic_toolbox.arraytools.nanequal(a, b)[source]

Element-wise equality comparison that treats NaN == NaN as True.

Parameters:
  • a – first array

  • b – second array

Returns:

boolean array

cosmic_toolbox.arraytools.new_array(n_rows, columns, ints=None, float_dtype=<class 'numpy.float64'>, int_dtype=<class 'numpy.int64'>)[source]

Create a new structured array with specified columns.

Parameters:
  • n_rows – number of rows

  • columns – list of column names

  • ints – list of column names that should be integers

  • float_dtype – dtype for float columns

  • int_dtype – dtype for integer columns

Returns:

numpy structured array initialized with zeros

cosmic_toolbox.arraytools.overwrite_hdf5_column(path, name, data, **kwargs)[source]

Overwrite a column in an existing HDF5 file.

Parameters:
  • path – path to HDF5 file or directory

  • name – column/dataset name

  • data – new data to write

  • kwargs – additional arguments for create_dataset

cosmic_toolbox.arraytools.pd2rec(df)[source]

Convert a pandas dataframe to a numpy structured array.

Parameters:

df – pandas dataframe

Returns:

numpy structured array

cosmic_toolbox.arraytools.rec2arr(rec, return_names=False)[source]

Convert a numpy structured array to a numpy array.

Parameters:
  • rec – numpy structured array

  • return_names – if True, also return the names of the columns

Returns:

numpy array

Example

>>> rec = np.array([(1, 4), (2, 4), (3, 4)],
        dtype=[('a', '<i8'), ('b', '<i8')])
>>> arr = rec2arr(rec)
>>> arr
array([[1, 4], [2, 4], [3, 4]])
>>> arr, names = rec2arr(rec, return_names=True)
>>> arr
array([[1, 4], [2, 4], [3, 4]])
>>> names
['a', 'b']
cosmic_toolbox.arraytools.rec2class(rec)[source]

Convert a numpy structured array to a class.

Parameters:

rec – numpy structured array

Returns:

class

cosmic_toolbox.arraytools.rec2dict(rec)[source]

Convert a numpy structured array to a dictionary.

Parameters:

rec – numpy structured array

Returns:

dictionary

Example

>>> rec = np.array([(1, 4), (2, 4), (3, 4)],
        dtype=[('a', '<i8'), ('b', '<i8')])
>>> d = rec2dict(rec)
>>> d
{'a': array([1, 2, 3]), 'b': array([4, 4, 4])}
cosmic_toolbox.arraytools.rec2pd(rec)[source]

Convert a numpy structured array to a pandas DataFrame.

Multi-dimensional columns are flattened with suffix _0, _1, etc.

Parameters:

rec – numpy structured array

Returns:

pandas DataFrame

cosmic_toolbox.arraytools.rec_float64_to_float32(cat)[source]

Convert float64 columns in a structured array to float32.

Parameters:

cat – numpy structured array

Returns:

structured array with float32 instead of float64

cosmic_toolbox.arraytools.remove_infs(rec)[source]

Remove rows with infs from a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.remove_nans(rec)[source]

Remove rows with NaNs from a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.replace_hdf5_dataset(fobj, name, data, **kwargs)[source]

Replace or create a dataset in an HDF5 file.

Parameters:
  • fobj – h5py File object

  • name – dataset name

  • data – data to write

  • kwargs – additional arguments for create_dataset

cosmic_toolbox.arraytools.save_dict_to_hdf5(filename, data_dict, kw_compress=None)[source]

Save a nested dictionary to HDF5 with groups and datasets.

Parameters:
  • filename – path to the HDF5 file

  • data_dict – nested dict {group_name: {dataset_name: data}}

  • kw_compress – compression kwargs (default: lzf with shuffle)

cosmic_toolbox.arraytools.save_hdf(filename, arr, **kwargs)[source]

Save a structured array to an HDF5 file as a single ‘data’ dataset.

Parameters:
  • filename – path to the HDF5 file

  • arr – numpy structured array to save

  • kwargs – additional arguments passed to h5py.create_dataset

cosmic_toolbox.arraytools.save_hdf_cols(filename, arr, compression=None, resizable=False, suppress_log=False)[source]

Save a structured array to HDF5 with each column as a separate dataset.

Parameters:
  • filename – path to the HDF5 file

  • arr – numpy structured array to save

  • compression – compression method (e.g., ‘lzf’, ‘gzip’) or dict

  • resizable – if True, create resizable datasets

  • suppress_log – if True, log at debug level instead of info

cosmic_toolbox.arraytools.select_finite(rec)[source]

Remove rows with NaNs or infs from a numpy structured array.

Parameters:

rec – numpy structured array

Returns:

numpy structured array

cosmic_toolbox.arraytools.set_loading_dtypes(arr)[source]

Convert array dtypes after loading from HDF5 (bytes to unicode strings).

Parameters:

arr – numpy array, bytes, list, or scalar

Returns:

Python-compatible version of input

cosmic_toolbox.arraytools.set_storing_dtypes(arr)[source]

Convert array dtypes for HDF5 storage (unicode strings to bytes).

Parameters:

arr – numpy array, string, list, or scalar

Returns:

storage-compatible version of input

cosmic_toolbox.arraytools.view_fields(rec, names)[source]

rec must be a numpy structured array. names is the collection of field names to keep.

Returns a view of the array a (not a copy).

cosmic_toolbox.arraytools.write_to_hdf(filename, arr, name='data', compression='lzf', shuffle=True, **kwargs)[source]

Write a recarray to a hdf file.

Parameters:
  • filename – filename of the hdf file

  • arr – numpy recarray

  • name – name of the dataset

  • compression – compression method

  • shuffle – shuffle data before compression

  • kwargs – keyword arguments for h5py.File.create_dataset

cosmic_toolbox.colors module

Color utilities for matplotlib plots.

Provides custom color cycles and utilities for setting matplotlib color schemes.

cosmic_toolbox.colors.get_colors(cycle='silvan')[source]

Get a dictionary of named colors for a given color cycle.

Parameters:

cycle (str) – Name of the color cycle to use. Currently only “silvan” is supported. If a different value is passed, it is returned as-is.

Returns:

Dictionary mapping color names to hex color codes, or the input value if not a recognized cycle name.

Return type:

dict or any

cosmic_toolbox.colors.set_cycle(cycle='silvan')[source]

Set the matplotlib color cycle to a predefined color scheme.

Parameters:

cycle (str or list or dict) – Name of the color cycle to use, or a list/dict of colors. If “silvan”, uses the predefined Silvan color scheme.

cosmic_toolbox.copy_guardian module

Copy Guardian - Rate-limited file copying with semaphore-based concurrency control.

Provides utilities for copying files locally and remotely with controlled concurrency to avoid overloading network resources.

@author: Joerg Herbel

class cosmic_toolbox.copy_guardian.CopyGuardian(n_max_connect, n_max_attempts_remote, time_between_attempts, use_copyfile=False)[source]

Bases: object

Rate-limited file copier with semaphore-based concurrency control.

This class manages file copying operations with controlled concurrency using file-based semaphores. It supports both local and remote (rsync) copy operations.

Parameters:
  • n_max_connect (int) – Maximum number of simultaneous connections allowed.

  • n_max_attempts_remote (int) – Maximum number of retry attempts for remote copies.

  • time_between_attempts (float) – Time in seconds to wait between retry attempts.

  • use_copyfile (bool) – If True, use shutil.copyfile instead of shutil.copy for local file copies (preserves no metadata).

cosmic_toolbox.file_utils module

File utilities for reading, writing, and copying files.

Provides functions for: - Reading/writing pickle and HDF5 files with compression - Robust file/directory operations (makedirs, remove, copy) - Remote file operations via SSH/rsync - YAML file handling

cosmic_toolbox.file_utils.copy_with_copy_guardian(sources, destination, n_max_connect=10, timeout=1000, folder_with_many_files=False)[source]

Copy files/directories using the CopyGuardian.

Parameters:
  • sources – List of source files/directories.

  • destination – Destination directory.

  • n_max_connect – Maximum number of simultaneous connections.

  • timeout – time in seconds to wait for a connection to become available

  • folder_with_many_files – If True, the source is a folder with many files

cosmic_toolbox.file_utils.ensure_permissions(path, verb=False)[source]

Set file permissions to user rwx, group rx, others rx.

Parameters:
  • path – path to file or directory

  • verb – if True, log the permission change

cosmic_toolbox.file_utils.get_abs_path(path)[source]

Get the absolute path, handling remote paths and environment variables.

Parameters:

path – relative or absolute path (can be Path object)

Returns:

absolute path string

cosmic_toolbox.file_utils.is_remote(path)[source]

Check if a path is a remote path (user@host:/path format).

Parameters:

path – path to check (can be Path object)

Returns:

True if remote, False otherwise

cosmic_toolbox.file_utils.load_from_hdf5(file_name, hdf5_keys, hdf5_path='')[source]

Load data stored in a HDF5-file. :param file_name: Name of the file. :param hdf5_keys: Keys of arrays to be loaded. :param hdf5_path: Path within HDF5-file appended to all keys. :return: Loaded arrays.

cosmic_toolbox.file_utils.read_from_hdf(filepath, name='data')[source]

Read an object from an hdf5 file.

Parameters:
  • filepath – Path to the hdf5 file.

  • name – Name of the dataset.

Returns:

Object read from the hdf5 file.

cosmic_toolbox.file_utils.read_from_pickle(filepath, compression='none')[source]

Read an object from a pickle file.

Parameters:
  • filepath – Path to the pickle file.

  • compression – Compression method to use. Can be “none”, “lzf” or “bz2”.

Returns:

Object read from the pickle file.

cosmic_toolbox.file_utils.read_yaml(filename)[source]

Read a YAML file.

Parameters:

filename – path to YAML file

Returns:

parsed YAML content

cosmic_toolbox.file_utils.robust_copy(src, dst, n_max_connect=50, method='CopyGuardian', folder_with_many_files=False, **kwargs)[source]

Copy files/directories using the specified method.

Parameters:
  • src – Source file/directory.

  • dst – Destination file/directory.

  • n_max_connect – Maximum number of simultaneous connections.

  • method – Method to use for copying. Can be “CopyGuardian” or “system_cp”.

  • folder_with_many_files – If True, the source is a folder with many files (only for CopyGuardian).

  • kwargs – Additional arguments passed to the copy method.

cosmic_toolbox.file_utils.robust_makedirs(path)[source]

Create directories, handling remote paths via SSH.

Parameters:

path – path to create (can be remote with user@host:path format)

cosmic_toolbox.file_utils.robust_remove(path)[source]

Remove a file or directory.

Parameters:

path – Path to the file or directory.

cosmic_toolbox.file_utils.system_copy(sources, dest, args_str_cp='')[source]

Copy files using the system cp command.

Parameters:
  • sources – list of source paths

  • dest – destination path

  • args_str_cp – additional arguments for cp command

cosmic_toolbox.file_utils.write_to_hdf(filepath, obj, name='data', **kwargs)[source]

Write an object to an hdf5 file.

Parameters:
  • filepath – Path to the hdf5 file.

  • obj – Object to write.

  • name – Name of the dataset.

  • kwargs – Additional arguments passed to h5py.File.create_dataset.

cosmic_toolbox.file_utils.write_to_pickle(filepath, obj, compression='none')[source]

Write an object to a pickle file.

Parameters:
  • filepath – Path to the pickle file.

  • obj – Object to write.

  • compression – Compression method to use. Can be “none”, “lzf” or “bz2”.

cosmic_toolbox.logger module

Logging utilities with colored output and progress bars.

Provides a customized logger with: - Color-coded log levels (debug=violet, warning=orange, error=red) - Progress bar integration via tqdm - Environment variable control (PYTHON_LOGGER_LEVEL)

class cosmic_toolbox.logger.ColorFormatter(fmt=None, datefmt=None, style='%', validate=True, *, defaults=None)[source]

Bases: Formatter

Custom log formatter with color-coded output based on log level.

Colors: - DEBUG: violet - INFO: default - WARNING: bold orange - ERROR/CRITICAL: bold red

BOLD = '\x1b[1m'
FORMATS = {10: '\x1b[95m%(asctime)s %(name)10s %(levelname).3s   %(message)s \x1b[0m', 20: '%(asctime)s %(name)10s %(levelname).3s   %(message)s ', 30: '\x1b[1m\x1b[33m%(asctime)s %(name)10s %(levelname).3s   %(message)s \x1b[0m', 40: '\x1b[1m\x1b[91m%(asctime)s %(name)10s %(levelname).3s   %(message)s \x1b[0m', 50: '\x1b[1m\x1b[91m%(asctime)s %(name)10s %(levelname).3s   %(message)s \x1b[0m'}
ORANGE = '\x1b[33m'
RED = '\x1b[91m'
UNDERLINE = '\x1b[4m'
VIOLET = '\x1b[95m'
YELLOW = '\x1b[93m'
format(record)[source]

Format the log record with appropriate colors.

Parameters:

record – log record to format

Returns:

formatted string with ANSI color codes

reset = '\x1b[0m'
class cosmic_toolbox.logger.Progressbar(logger=None)[source]

Bases: object

Progress bar wrapper for use with logger.

Wraps tqdm with sensible defaults and respects the logger’s level.

cosmic_toolbox.logger.get_logger(filepath, logging_level=None)[source]

Get a configured logger with colored output and progress bar support.

If logging_level is unspecified, uses the PYTHON_LOGGER_LEVEL environment variable, defaulting to ‘info’.

Parameters:
  • filepath – name of the file calling the logger (used for logger name)

  • logging_level – logging level (‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’)

Returns:

configured logger object with progressbar attribute

cosmic_toolbox.logger.set_all_loggers_level(level)[source]

Set the logging level for all loggers and future loggers.

Sets the PYTHON_LOGGER_LEVEL environment variable and updates all existing loggers.

Parameters:

level – level string (‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’)

cosmic_toolbox.logger.set_logger_level(logger, level)[source]

Set the logging level for a logger.

Parameters:
  • logger – logger instance

  • level – level string (‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’)

cosmic_toolbox.utils module

General utilities for argument parsing, multiprocessing, and convenience functions.

Provides functions for: - Converting argument strings to dictionaries - Parsing command-line sequences (lists, tuples) - Running functions in parallel with multiprocessing - Miscellaneous helper functions

cosmic_toolbox.utils.arg_str_to_dict(arg_str)[source]

Converts a string in the format “{arg:value}” or “{arg1:value1,arg2:value2,…}” to a dictionary with keys and values. Note: strings should not contain ‘ or “.

Parameters:

arg_str – A string in the format “{arg:value}” or “{arg1:value1, arg2:value2,…}”.

Return arg_dict:

dictionary with keys and values corresponding to the input string.

cosmic_toolbox.utils.is_between(x, min, max)[source]

Checks if x is between min and max.

Parameters:
  • x – Value to check.

  • min – Minimum value.

  • max – Maximum value.

Returns:

True if x is between min and max, False otherwise.

cosmic_toolbox.utils.parse_list(s)[source]

Parses a string to a list for argparse. Can be used as type for argparse.

Parameters:

s – String to parse.

Returns:

list.

cosmic_toolbox.utils.parse_sequence(s)[source]

Parses a string to a list/tuple for argparse. Can be used as type for argparse.

Parameters:

s – String to parse.

Returns:

tuple or list.

Raises:

argparse.ArgumentTypeError – If the string cannot be parsed to a tuple.

cosmic_toolbox.utils.random_sleep(max_seconds=0, min_seconds=0)[source]

Sleeps for a random amount of time between min_seconds and max_seconds.

Parameters:
  • max_seconds – Maximum number of seconds to sleep.

  • min_seconds – Minimum number of seconds to sleep.

cosmic_toolbox.utils.run_imap_multiprocessing(func, argument_list, num_processes, verb=True)[source]

Runs a function with a list of arguments in parallel using multiprocessing.

Parameters:
  • func – Function to run.

  • argument_list – List of arguments to run the function with.

  • num_processes – Number of processes to use.

  • verb – If True, show progress bar.

Returns:

List of results from the function.

Module contents