estats Version 3.1.0 - BETA

https://cosmo-gitlab.phys.ethz.ch/cosmo_public/estats/badges/master/coverage.svg https://cosmo-gitlab.phys.ethz.ch/cosmo_public/estats/badges/master/pipeline.svg http://img.shields.io/badge/arXiv-2006.12506-orange.svg?style=flat http://img.shields.io/badge/arXiv-2110.10135-orange.svg?style=flat

estats is part of the Non-Gaussian Statistics Framework (NGSF).

If you use this package in your research please cite Zuercher et al. 2020 (arXiv-2006.12506) and Zuercher et al. 2021 (arXiv-2110.10135).

Source

Documentation

Introduction

The estats package contains the main building blocks of the NGSF. It was initially built to constrain cosmological parameters using Non-Gaussian weak lensing mass map statistics. The different submodules are independent of each other and can easily be used in other codes individually.

catalog

The catalog module handles a galaxy catalog consisting out of coordinates, ellipticities, weights and tomographic bins for each galaxy. Main functionalities: Can rotate the catalog on the sky, output survey mask, weight maps, shear and convergence maps. The convergence maps are calculated from the shear maps using spherical Kaiser-Squires. Also allows generation of shape noise catalogs or shape noise maps using random rotation of the galaxies in place. Can also pass additional scalar and obtain maps for these.

The most important functionalities are:

  • rotate_catalog:

    Allows to rotate the survey on the sky by given angles. The spin-2 ellipticity field can also be rotated by the appropriate angle.

  • get_mask:

    Returns the survey mask as a binary Healpix (Gorski et al. 2005) map.

  • get_map:

    Can return Healpix shear maps, convergence maps or a weight map (number of galaxies per pixel). The convergence map is calculated from the ellipticities of the galaxies using the spherical Kaiser-Squires routine (Wallis et al. 2017). The shear and convergence maps are weighted by the galaxy weights.

  • generate_shape_noise_map:

    Generates a shape noise Healpix shear map by random rotation of the galaxies ellipticities in place. The weights are considered in the generation of the noise.

The accepted keywords are:

  • NSIDE:

    default: 1024

    choices: an integer being a power of 2

    The Healpix resolution that is used to produce the map products.

  • degree:

    default: True

    choices: True, False

    If True the coordinates are assumed to be given in degrees otherwise in radians.

  • colat:

    default: False

    choices: True, False

    If True the second coordinate is assumed to be a co-latitude otherwise a normal latitude is assumed.

  • prec:

    default: 32

    choices: 64, 32, 16

    Size of the float values in the Healpix maps in bits. For less then 32 hp.UNSEEN is mapped to -inf -> No RAM optimization anymore

map

The map module handles shear and convergence maps and calculates summary statistics from them.

The summary statistics are defined via plugins that are located in the estats.stats folder. This allows users to easily add their own summary statistic without having to modify the internals of the code. See the Implementing your own summary statistic Section to learn how to do that.

The summary statistics can be calculated from the shear maps, the convergence maps or from smoothed convergence maps (multiscale approach for extraction of non-Gaussian features).

If only one set of weak lensing maps is given the statistics will be calculated from that set directly. If two sets are given and the name of the statistic to be computed contains the word Cross both sets are passed to the statistics plugin. This can be used to calculate cross-correlations between maps from different tomographic bins for example. In the case of a multiscale statictics the maps are convolved into a cross-correlated map. If the statitic to compute contains the word Full all maps are passed over.

The most important functionalities are:

  • convert_convergence_to_shear:

    Uses spherical Kaiser-Squires to convert the internal shear maps to convergence maps. The maps are masked using the internal masks. By default the trimmed mask is used to allow the user to disregard pixels where the spherical harmonics transformation introduced large errors.

  • convert_shear_to_convergence:

    Uses spherical Kaiser-Squires to convert the internal E-mode convergence map to shear maps. The maps are masked using the internal masks. By default the trimmed mask is used to allow the user to disregard pixels where the spherical harmonics transformation introduced large errors.

  • smooth_maps:

    Applies a Gaussian smoothing kernel to all internal convergence maps (E- and B-modes). The fwhm parameter decides on the FWHM of the Gaussian smoothing kernel. It is given in arcmin.

  • calc_summary_stats:

    Main functionality of the module. Allows to use statistics plugins located in the estats.stats folder to calculate map based statistics.

    See the Implementing your own summary statistic Section to learn how the statistics plugins look like and how to make your own one.

    The summary statistics can be calculated from the shear maps, the convergence maps or from a smoothed convergence maps (multiscale approach for extraction of non-Gaussian features).

    Instead of using the internal masks for masking, extra masks can be used. This allows to use maps with multiple survey cutouts on it and select a different cutout each time the function is called.

    If use_shear_maps is set to True the function will convert the shear maps into convergence maps using spherical Kaiser-Squires instead of using the convergence maps directly.

    If copy_obj is set to False, no copies of the internal maps are made. This can save RAM but it also leads to the internal maps being overwritten. If you wish to use the internal maps after the function call set this to True!

    By default the function returns the calculated statistics in a dictionary. But if write_to_file is set to True it will append to files that are defined using the defined_parameters, undefined_parameters, output_dir and name arguments using ekit.paths.

The accepted keywords are:

  • NSIDE:

    default: 1024

    choices: an integer being a power of 2

    The Healpix resolution that is used to produce the map products.

  • scales:

    default: [31.6, 29.0, 26.4, 23.7, 21.1, 18.5, 15.8, 13.2, 10.5, 7.9, 5.3, 2.6]

    choices: A list of floats indicating the FWHM of the Gaussian smoothing kernels to be applied in arcmin.

    For summary statistics that are of the multi type (see Implementing your own summary statistic) the summary statistics are extracted from the convergence maps for a number of scales (multiscale approach). To do so the maps are smoothed with Gaussian smoothing kernels of different size. The scales indicate the FWHM of these scales.

  • polarizations:

    default: ‘A’

    choices: ‘E’, ‘B’, ‘A’

    If E only returns E-mode statistics only when calc_summary_stats is used. If B only returns B-mode statistics only when calc_summary_stats is used. If A both E- and B-mode statistics are returned when calc_summary_stats is used.

  • prec:

    default: 32

    choices: 64, 32, 16

    Size of the float values in the Healpix maps in bits. For less then 32 hp.UNSEEN is mapped to -inf -> No RAM optimization anymore

summary

The summary module is meant to postprocess summary statistics measurements.

The main functionality of the summary module is to calculate mean data-vectors, standard deviations and covariance or precision matrices for the summary statistics at different parameter configurations, based on a set of realizations of the summary statistic at each configuration.

The meta data (e.g. cosmology setting, precision settings, tomographic bin and so on) for each set of realizations (read-in from a file or an array directly) can be given to the module on read-in directly or parsed from the filename. Directly after read-in a first postprocessing can be done using the process function defined in the statistic plugin. The read-in data-vectors are stored appended to a data table for each statistic and the meta data is added to an internal meta data table. The realizations are ordered according to their meta data entry. There are two special entries for the meta data (tomo: label for the tomographic bin of the data-vectors, NREALS: the number of data-vectors associated to each entry (is inferred automatically)). All other entries can be defined by the user.

The summary module allows to downbin the potentially very long data-vectors into larger bins using a binning scheme. The decide_binning_scheme function in the statistic plugin is used to decide on that scheme, which defines the edges of the large bins based on the bins of the original data-vectors. For plotting purposes the binning scheme can also define the values of each data bin (for example its signal-to-noise ratio). The slice function in the statistic plugin then defines how exactly the binning scheme is used to downbin each data-vector. See the Implementing your own summary statistic Section for more details.

The summary module allows to combine summary statistics calculated for different tomographic bins to perform a tomographic analysis. The tomo entry in the meta data table defines the label of the tomographic bin for each set of data-vector realizations. One can define the order of the labels when combined into a joint data-vector using the cross_ordering keyword.

The summary module also allows to combine different summary statistics into a joint data-vector.

The most important functionalities are:

  • generate_binning_scheme:

    Uses the decide_binning_scheme function from the statistic plugin to create a binning scheme. The scheme can be created for different tomographic bins and scales. See the Section Implementing your own summary statistic for more details.

  • readin_stat_files:

    Reads in data-vector realizations from a file. The process function from the statistics plugin is used to perform a first processing of the data. The meta data for each file can either be given directly or can be parsed from the file name by giving a list of parameters indicating the fields to be parsed (using ekit).

  • downbin_data:

    Uses the created binning scheme to bin the data-vector entries into larger bins. Uses the slice function from the statistics plugin to do so.

  • join_redshift_bins:

    Joins all data-vector realizations of a specific statistic at the same configuration. The tomo entry in the meta data table defines the label of the tomographic bin for each set of data-vector realizations. One can define the order of the labels when combined into a joint data-vector using the cross_ordering keyword. If for a specific parameter configuration different number of realizations are found for different tomographic bins, only the minimum number of realizations is used to calculate the combined data-vectors.

  • join_statistics:

    Creates a new statistic entry including the data table and the meta data table, by concatenating the data-vectors of a set of statistics. The new statistic has the name statistic1-statistic2-… If for a specific parameter configuration different number of realizations are found for different statistics, only the minimum number of realizations is used to calculate the combined data-vectors.

  • get_means:

    Returns the mean data vectors of a statistic for the different parameter configurations.

  • get_meta:

    Returns the full meta data table for a statistic.

  • get_errors:

    Returns the standard deviation of the data vectors of a statistic for the different configurations.

  • get_covariance_matrices:

    Returns the covariance matrices estimated from the realizations at each configuration. Can also invert the covariance matrices directly to obtain the precision matrices.

The accepted keywords are:

  • cross_ordering:

    default: []

    choices: a list of labels

    Indicates the order of the tomographic bin labels that is used by join_redshift_bins to combine data-vectors from different tomographic bins.

    The labels could be bin1xbin2 for example, and the corresponding cross_ordering could be [1x1, 1x2, 2x2, 2x3, 3x3].

likelihood

Class meant to perform parameter inference based on predictions of the data-vectors and covariance matrices at different parameter configurations.

The main functionality is to calculate the negative logarithm of the likelihood at a given parameter configuration given a measurement data-vector.

The parameter space is broken down into two parts called parameters and nuisances, that are treated differently.

For the parameter part it is assumed that the space is sampled densly with simulations and an emulator is built from the simulations.

For the nuisances it is assumed that only delta simulations at the fiducial parameter configuration are available where only this one parameters is varied. In this case polynomial scaling relations are fitted for each bin of the data vector that are used to describe the influence of the parameter when predicting the statistic. This implicitly assumes that these parameters are independent from all other parameters.

The data vectors can also be compressed using PCA or MOPED compression.

The most important functionalities are:

  • readin_interpolation_data:

    Loads data used for interpolation. The data is expected to be in a format as used by the estats.summary module.

  • convert_to_PCA_space:

    Builds PCA compression and converts all data vectors to PCA space. All output will be in PCA space afterwards.

  • convert_to_MOPED_space:

    Builds MOPED compression and converts all data vectors to MOPED space. Requires emulator to be built before. All output will be in MOPED space afterwards.

  • build_emulator:

    Builds the emulator for the parameter space used to interpolate the expected data-vectors between different parameter configurations. There are three different choices for the type of interpolator used at the moment: linear: Uses N-Dimensional linear interpolator GPR: Gaussian Process Regressor NN: Neural network

  • build_scalings:

    Builds the polynomial scaling realtions for the nuisance parameters individually for each data bin. A polynomial function is fitted for each bin and each nuisance parameter.

  • get_neg_loglikelihood:

    Returns negative logarithmic likelihood given a measurement data-vector at the location in parameter space indicated.

The accepted keywords are:

  • statistic:

    default: Peaks

    choices: name of one of the statistic plugins

    Decides which statistic plugin to use. In the likelihood module only the filter function is used from the plugin.

  • parameters:

    default: [Om, s8]

    choices: list of strings

    The names of the parameters to consider

  • parameter_fiducials:

    default: [0.276, 0.811]

    choices: list of floats

    The default values of the parameters. Used to decide on the fiducial covariance matrix if no interpolation of the covariance matrix is used.

  • nuisances:

    default: [IA, m, z]

    choices: list of strings

    The names of the nuisance parameters to consider

  • nuisance_fiducials:

    default: [0.0, 0.0, 0.0]

    choices: list of floats

    The default values of the nuisance parameters. Used to decide on the fiducial covariance matrix if no interpolation of the covariance matrix is used.

  • n_tomo_bins:

    default: 3

    choices: integer

    The number of tomographic bins considered. Only needed if the special emulator is used or a statistic with the name Cross in it.

  • cross_ordering:

    default: []

    choices: a list of labels

    Indicates the order of the tomographic bin labels that is assumed in the filter function.

    The labels could be bin1xbin2 for example, and the corresponding cross_ordering could be [1x1, 1x2, 2x2, 2x3, 3x3].

  • multi_bin:

    default: [False, True, True]

    choices: A boolean list

    Indicates if a nuisance parameter in nuisances is a global parameter (False) or a corresponding parameter should be introduced for each tomographic bin (True).

Getting Started

The easiest and fastest way to learn about estats is to have a look at the Tutorial Section.

Credits

This package was created by Dominik Zuercher (PhD student at ETH Zurich in Alexandre Refregiers Comsology Research Group)

The package is maintained by Dominik Zuercher dominik.zuercher@phys.ethz.ch.

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

Contents:

Feedback

If you have any suggestions or questions about estats feel free to email me at dominikz@phys.ethz.ch.

If you encounter any errors or problems with estats, please let me know!