riogrande.helper#

General-purpose helper functions for the riogrande package.

This module collects utility functions that are used across the package but do not belong to the I/O layer or the parallelization machinery. It covers:

Attributes#

Functions#

aggregated_selector(masks[, logic])

Turns several rasterio masks into a boolen selector for a numpy array

check_compatibility(*sources)

Assert that all the sources are compatible with each other.

check_crs(*sources)

Assert that all the sources have the same coordinate reference system (crs)

check_resolution(*sources)

Assert that all the sources have the same spatial resolution

check_units(*sources)

Assert that all sources have the same linear units in the coordinate reference system (crs)

convert_to_dtype(data[, as_dtype, in_range, out_range])

Converts data to as_dtype and optionally rescales it.

count_contribution(data, selector[, no_data])

The remaining number of data cells when applying the selector

deserialize(tags)

Reads python objects from JSON-encoded values of a dict

dtype_range(dtype)

Get the range of the specified dtype

get_nbr_workers([number])

Determine the number of worker processes to use in mulitprocessing.

get_or_set_context([method])

Return a multiprocessing context and set the global start method if unset.

match_all(targets, tags)

Check if all tags in targets are present in tags

match_any(targets, tags)

Check if any tag in targets is present in tags

output_filename(base_name, out_type[, blur_params])

Construct the filename for the specific output type.

rasterio_to_numpy_dtype(rasterio_dtype)

Map Rasterio actual data types to NumPy data types.

reduced_mask(array[, nodata, logic])

Computes a mask based on the value of several bands

sanitize(tags)

Serializes then deserializes values of a dict

serialize(tags)

Convert the values of a dict into JSON

view_to_window(view)

Conerts a view into a rasterio Window

Module Contents#

riogrande.helper.aggregated_selector(masks, logic='all')[source]#

Turns several rasterio masks into a boolen selector for a numpy array

Rasterio masks are uint8 numpy arrays where every value > 0 is considered a valid cell

Parameters:
  • masks (list[NDArray]) – Arbitrary number of numpy arrays resulting from rasterio.io.DatasetReader.dataset_mask() or rasterio.io.DatasetReader.read_masks().

  • logic (str) – Determines how the aggreagation should happen. If 'all' (the default) a cell is only selected if all masks consider it valid data — aggregated via numpy.logical_and(). 'any' selects cells which at least one mask considers valid — aggregated via numpy.logical_or().

Returns:

Boolean numpy array as result of logical mask applied.

Return type:

NDArray

See also

reduced_mask()

Compute a mask from nodata values across bands.

riogrande.helper.check_compatibility(*sources)[source]#

Assert that all the sources are compatible with each other.

The checks include:

Parameters:

*sources (str) – List of sources (paths to files) from which are to be compared to each other.

Returns:

  • crss (list) – All unique crs from sources in a list (see check_crs()).

  • units (list) – All unique units from sources in a list (see check_units()).

  • ress (list) – All unique resolutions from sources in a list (see check_resolution()).

Return type:

Tuple[list, list, list]

See also

check_crs()

Check that sources share the same CRS.

check_units()

Check that sources share the same linear units.

check_resolution()

Check that sources share the same resolution.

riogrande.helper.check_crs(*sources)[source]#

Assert that all the sources have the same coordinate reference system (crs)

Parameters:

*sources (str) – List of sources (paths to files) from which crs are to be compared to each other.

Returns:

All unique crs from sources in a list.

Return type:

list

See also

check_units()

Check that sources share the same linear units.

check_resolution()

Check that sources share the same resolution.

check_compatibility()

Run all three checks at once.

riogrande.helper.check_resolution(*sources)[source]#

Assert that all the sources have the same spatial resolution

Parameters:

*sources (str) – List of sources (paths to files) from which resolutions are to be compared to each other.

Returns:

All unique resolutions from sources in a list.

Return type:

list

See also

check_units()

Check that sources share the same linear units.

check_crs()

Check that sources share the same CRS.

check_compatibility()

Run all three checks at once.

riogrande.helper.check_units(*sources)[source]#

Assert that all sources have the same linear units in the coordinate reference system (crs)

Parameters:

*sources (str) – List of sources (paths to files) from which units are to be compared to each other.

Returns:

All unique units in a list.

Return type:

list

See also

check_crs()

Check that sources share the same CRS.

check_resolution()

Check that sources share the same resolution.

check_compatibility()

Run all three checks at once.

riogrande.helper.convert_to_dtype(data, as_dtype=None, in_range=None, out_range=None)[source]#

Converts data to as_dtype and optionally rescales it.

Rescaling is done only if at least one of the ranges is explicitly set. If only in_range is set then the input range is scaled to the full range of the output data type, ad_dtype. This behaviour is typically wanted when converting some floating typed data in a limited range, e.g. [0, 1] to unsigned integer, e.g. uint8, thus mapping the range [0,1] to [0, 255].

In case only out_range is set, the full data type range of the input data is mapped to the provided out_range. This is typically used if converting from a “limited” range, like uint8 to a floating data type.

Note

The default range for any floating type is [0,1]!

This means:

  • If the output data type, as_dtype is any subclass of np.floating and no out_range is defined then the output is scaled to the intervarl [0, 1].

  • If data is of any np.floating type and the data range lies withing [0, 1] (and in_range is not provided) then in_range is set to be [0, 1].

Parameters:
  • data (NDArray) – Input numpy NDArray

  • as_dtype (type or str or None) – Desired data type to convert to (e.g. np.float64). If not provided then at least the out_range needs to be set in which case the data type remains unchanges, but the data is rescaled.

  • in_range (NDArray or Collection or None) –

    An array or list from which min and max will be used as input range. Min and max are read with numpy.nanmin() / numpy.nanmax().

    Note

    You might simply provide the same value as for data in order to use its min an max for scaling

  • out_range (NDArray or Collection or str or type or None) – An array or list from which min and max will be used as limits for the output. Alternatively, a data type can be specified, in which case the data will be scaled to the full range of the specified data type (see dtype_range()).

Returns:

Converted numpy NDArray with desired data type.

Return type:

NDArray

See also

dtype_range()

Get the min/max of a NumPy dtype.

Examples

>>> # simple conversion, no rescaling
>>> my_data = np.array([0, 0.5, 1.], dtype=np.float64)
>>> convert_to_dtype(my_data, as_dtype='uint8')
array([0, 0, 1], dtype=uint8)
>>> # conversion with rescaling specifying in_range only
>>> new_data = convert_to_dtype(my_data, as_dtype='uint8', in_range=(0,1))
>>> new_data
array([  0, 127, 255], dtype=uint8)
>>> # convert with scaling specifying out_range only
>>> convert_to_dtype(data=new_data, as_dtype='float64', out_range=[-1, 1])
array([-1.        , -0.00392157,  1.        ])
>>> # only scaling, keeping data type
>>> convert_to_dtype(data=my_data, in_range=[0,1], out_range=[-1, 1])
array([-1.,  0.,  1.])
>>> # scaling with data type as range
>>> convert_to_dtype(data=my_data, in_range=[0,1], as_dtype='uint16', out_range='uint8')
array([  0, 127, 255], dtype=uint16)
riogrande.helper.count_contribution(data, selector, no_data=0)[source]#

The remaining number of data cells when applying the selector

Uses numpy.unique() with return_counts=True to count valid cells.

Parameters:
  • data (NDArray) – The data to cont the contribution in

  • selector (NDArray) – A boolean array in the shape of data selecting the single cells that should be considered

  • no_data (int or float) –

    The value that should be considered as invalid.

    Note

    You might also provide numpy.nan as no data value (detected via numpy.isnan()).

Returns:

Count of valid cells (pixels in rasterfile).

Return type:

int

See also

aggregated_selector()

Build a selector from rasterio band masks.

reduced_mask()

Build a mask from nodata values across bands.

riogrande.helper.deserialize(tags)[source]#

Reads python objects from JSON-encoded values of a dict

Each value is parsed using json.loads().

Parameters:

tags (dict[str, str]) – Dictionary with tag as key and serialized values.

Returns:

Dictionary with tag as key and deserialized value as value.

Return type:

dict

Notes

Inverse operation of serialize().

See also

serialize()

Convert dict values to JSON strings.

sanitize()

Serialize then deserialize in one step.

riogrande.helper.dtype_range(dtype)[source]#

Get the range of the specified dtype

Uses numpy.iinfo() for integer types and numpy.finfo() for floating-point types.

Warning

This functions returns min or max as either int or floats.

Be sure to convert them back into dtype if needed!

Parameters:

dtype (type or str) – A NumPy dtype (e.g. np.uint8, np.float32) or a string representation thereof (e.g. 'uint8').

Returns:

(max, min) of the dtype’s representable range as Python int or float.

Return type:

tuple

Raises:

ValueError – If dtype has no defined min/max values.

See also

convert_to_dtype()

Convert and optionally rescale an array.

riogrande.helper.get_nbr_workers(number=None)[source]#

Determine the number of worker processes to use in mulitprocessing.

Parameters:

number (int or None, optional) – Desired number of workers. If None, the function will use the number of CPUs available via multiprocessing.cpu_count(), but never less than 2.

Returns:

Number of workers to use (always >= 2).

Return type:

int

Notes

A warning is emitted when a requested number is lower than 2 and the request is ignored setting the number of used workers to 2.

See also

get_or_set_context()

Return a multiprocessing context.

riogrande.helper.get_or_set_context(method=None)[source]#

Return a multiprocessing context and set the global start method if unset.

The function tries to be conservative about changing global interpreter state:

  • If method is None, it returns a context for the currently configured global start method when one exists; otherwise it warns and returns a context for a sensible default (‘spawn’ is used to establish compatibility with windows).

  • If method is provided and no global start method is set, it attempts to set the global start method to method. If that attempt races with another thread/process, it falls back to returning a context for method without changing the global start method.

  • If method is provided and a different global start method is already set, the global start method is not changed; a warning is emitted and a context for the requested method is returned so callers can still create objects using the requested start semantics.

Parameters:

method ({None, 'fork', 'spawn', 'forkserver'}, optional) –

Desired multiprocessing start method to use for the returned context. If None the function will:

  • return a context for the currently configured global start method if one exists, or

  • emit a RuntimeWarning and return a context for the configured default method (spawn) if no global method is set.

Valid explicit values are 'fork', 'spawn' and 'forkserver' (availability depends on the platform and Python build). Passing an unsupported value raises ValueError.

Returns:

A multiprocessing context object appropriate for creating multiprocessing.Process, multiprocessing.pool.Pool and related objects. The returned context will use the start method determined by the logic described above. The function always returns a context and never mutates an already-set global start method to a different value.

Return type:

multiprocessing.context.BaseContext

Raises:
  • ValueError – If method is not one of the supported start methods or None.

  • RuntimeError – If the function attempts to set the global start method and the call to multiprocessing.set_start_method raises RuntimeError for reasons other than a race (this is rare); in normal race cases the function catches the RuntimeError and falls back to returning the requested context.

Notes

  • Calling multiprocessing.set_start_method can only be done once per interpreter process. Once the global start method is set, it cannot be changed without restarting the interpreter. This function therefore avoids forcibly overwriting an existing different global start method.

  • The returned context is safe to use even when the global start method differs, because context objects encapsulate start semantics for the created processes independently of global state.

  • On Windows the only available start method is 'spawn'; on Unix-like systems 'fork' and 'spawn' are commonly available and 'forkserver' may be available depending on the platform.

  • Use this helper in library code when you need a guaranteed context but do not want to unconditionally mutate global multiprocessing state.

See also

get_nbr_workers()

Determine the number of worker processes.

Examples

>>> ctx = get_or_set_context('spawn')
>>> with ctx.Process(target=worker) as p:
>>>     p.start()
>>>     p.join()
riogrande.helper.match_all(targets, tags)[source]#

Check if all tags in targets are present in tags

Parameters:
  • targets (dict) – Dictionary with tags to match to.

  • tags (dict) – Dictionary with tags to check for matching items.

Returns:

True if all tags in targets are present in tags, otherwise False.

Return type:

bool

See also

match_any()

Return True if any tag matches.

riogrande.helper.match_any(targets, tags)[source]#

Check if any tag in targets is present in tags

Parameters:
  • targets (dict) – Dictionary with tags to match to.

  • tags (dict) – Dictionary with tags to check for matching items.

Returns:

True if any tags in targets are present in tags, otherwise False.

Return type:

bool

See also

match_all()

Return True only if all tags match.

riogrande.helper.output_filename(base_name, out_type, blur_params=None)[source]#

Construct the filename for the specific output type.

Parameters:
  • base_name (str) – The basic output name in the form <name>.tif

  • out_type (str) – The type of output that will be saved. This should be either ‘blur’ or ‘entropy’ but any string is accepted

  • blur_params (dict or None) – Output of get_blur_params, so ‘sigma’, ‘truncate’ and ‘diameter’ are expected keys.

Returns:

The resulting filename of the form ‘<name>_<out_type>_sig_<{sigma}>_diam_<{diameter}>_trunc_<{truncate}>.tif’

Return type:

str

riogrande.helper.rasterio_to_numpy_dtype(rasterio_dtype)[source]#

Map Rasterio actual data types to NumPy data types.

Rasterio types like rasterio.dtypes.int16, rasterio.dtypes.float32 are mapped to their NumPy equivalents.

Parameters:

rasterio_dtype (str) – Output of rasterio.open(source).profile['dtype'], as returned by rasterio.open().

Returns:

Data type as numpy.dtype, or None if the type is unknown.

Return type:

numpy.dtype or None

riogrande.helper.reduced_mask(array, nodata=0, logic='all')[source]#

Computes a mask based on the value of several bands

Parameters:
  • array (NDArray) – 3D array holding multiple bands of map data

  • nodata (float or int or None) – Nodata value to use. Defaults to 0. Pass numpy.nan to mask NaN cells (detected via numpy.isnan()).

  • logic (str) –

    Allowed strings are:

    • "all" : Masked will be each cell for which all bands match the nodata value (aggregated via numpy.logical_or() across bands).

    • "any" : Masked will be each cell for which any band matches the nodata value (aggregated via numpy.logical_and() across bands).

Returns:

Boolean numpy array resulting from applied logic.

Return type:

NDArray

See also

aggregated_selector()

Aggregate rasterio band masks into a selector.

Examples

>>> mydata = np.array([[[2, 4], [0, 1]], [[5, 5], [1, 0]]])
>>> # only mask if all are nodata
>>> reduced_mask(mydata)
array([[1, 1],
       [1, 1]], dtype=uint8)
>>> # mask if any are nodata
>>> reduced_mask(mydata, logic='any')
array([[1, 1],
       [0, 0]], dtype=uint8)
riogrande.helper.sanitize(tags)[source]#

Serializes then deserializes values of a dict

Convenience wrapper that calls serialize() followed by deserialize(), ensuring values are in the same form they would be when loaded back from a .tif tag.

Parameters:

tags (dict[str, Any]) – Dictionary with tag as key and serializable value as value.

Returns:

Dictionary with tag as key and deserialized value as value.

Return type:

dict

See also

serialize()

Convert dict values to JSON strings.

deserialize()

Parse JSON strings back to Python objects.

riogrande.helper.serialize(tags)[source]#

Convert the values of a dict into JSON

Each value is serialized using json.dumps().

Parameters:

tags (dict[str, Any]) – Dictionary of tags with string keywords and any-type values, which are serializable.

Returns:

Dictionary with tag as key and serialized value as value.

Return type:

dict

See also

deserialize()

Inverse operation; parse JSON back to Python objects.

sanitize()

Serialize then deserialize in one step.

riogrande.helper.view_to_window(view)[source]#

Conerts a view into a rasterio Window

Parameters:

view (tuple[int, int, int, int] or None) – tuple (x, y, width, height) defining the view of the data array to update

Returns:

Rasterio window object, or None if view is None.

Return type:

rasterio.windows.Window

riogrande.helper.MPC_STARTER_METHODS = ['spawn', 'fork', 'forkserver'][source]#