riogrande.helper#
General-purpose helper functions for the riogrande package.
This module collects utility functions that are used across the package but do not belong to the I/O layer or the parallelization machinery. It covers:
Compatibility checks: CRS, spatial resolution, and unit validation across multiple raster sources (
check_compatibility(),check_crs(),check_resolution(),check_units()).Dtype conversion: Converting array values between numeric types with optional range rescaling (
convert_to_dtype(),dtype_range()).Tag utilities: Serializing, deserializing, sanitizing, and matching metadata tag dictionaries (
serialize(),deserialize(),sanitize(),match_all(),match_any()).Mask aggregation: Combining boolean selector arrays with logical AND/OR (
aggregated_selector(),reduced_mask()).Multiprocessing setup: Obtaining a multiprocessing context and determining the number of worker processes (
get_or_set_context(),get_nbr_workers()).Miscellaneous: Output filename generation, window-to-view conversion, and pixel contribution counting.
Attributes#
Functions#
|
Turns several rasterio masks into a boolen selector for a numpy array |
|
Assert that all the sources are compatible with each other. |
|
Assert that all the sources have the same coordinate reference system (crs) |
|
Assert that all the sources have the same spatial resolution |
|
Assert that all sources have the same linear units in the coordinate reference system (crs) |
|
Converts data to as_dtype and optionally rescales it. |
|
The remaining number of data cells when applying the selector |
|
Reads python objects from JSON-encoded values of a dict |
|
Get the range of the specified dtype |
|
Determine the number of worker processes to use in mulitprocessing. |
|
Return a multiprocessing context and set the global start method if unset. |
|
Check if all tags in targets are present in tags |
|
Check if any tag in targets is present in tags |
|
Construct the filename for the specific output type. |
|
Map Rasterio actual data types to NumPy data types. |
|
Computes a mask based on the value of several bands |
|
Serializes then deserializes values of a dict |
|
Convert the values of a dict into JSON |
|
Conerts a view into a rasterio Window |
Module Contents#
- riogrande.helper.aggregated_selector(masks, logic='all')[source]#
Turns several rasterio masks into a boolen selector for a numpy array
Rasterio masks are uint8 numpy arrays where every value > 0 is considered a valid cell
- Parameters:
masks (list[NDArray]) – Arbitrary number of numpy arrays resulting from
rasterio.io.DatasetReader.dataset_mask()orrasterio.io.DatasetReader.read_masks().logic (str) – Determines how the aggreagation should happen. If
'all'(the default) a cell is only selected if all masks consider it valid data — aggregated vianumpy.logical_and().'any'selects cells which at least one mask considers valid — aggregated vianumpy.logical_or().
- Returns:
Boolean numpy array as result of logical mask applied.
- Return type:
NDArray
See also
reduced_mask()Compute a mask from nodata values across bands.
- riogrande.helper.check_compatibility(*sources)[source]#
Assert that all the sources are compatible with each other.
The checks include:
crs (via
check_crs())units (via
check_units())resolution (via
check_resolution())
- Parameters:
*sources (str) – List of sources (paths to files) from which are to be compared to each other.
- Returns:
crss (list) – All unique crs from sources in a list (see
check_crs()).units (list) – All unique units from sources in a list (see
check_units()).ress (list) – All unique resolutions from sources in a list (see
check_resolution()).
- Return type:
See also
check_crs()Check that sources share the same CRS.
check_units()Check that sources share the same linear units.
check_resolution()Check that sources share the same resolution.
- riogrande.helper.check_crs(*sources)[source]#
Assert that all the sources have the same coordinate reference system (crs)
- Parameters:
*sources (str) – List of sources (paths to files) from which crs are to be compared to each other.
- Returns:
All unique crs from sources in a list.
- Return type:
See also
check_units()Check that sources share the same linear units.
check_resolution()Check that sources share the same resolution.
check_compatibility()Run all three checks at once.
- riogrande.helper.check_resolution(*sources)[source]#
Assert that all the sources have the same spatial resolution
- Parameters:
*sources (str) – List of sources (paths to files) from which resolutions are to be compared to each other.
- Returns:
All unique resolutions from sources in a list.
- Return type:
See also
check_units()Check that sources share the same linear units.
check_crs()Check that sources share the same CRS.
check_compatibility()Run all three checks at once.
- riogrande.helper.check_units(*sources)[source]#
Assert that all sources have the same linear units in the coordinate reference system (crs)
- Parameters:
*sources (str) – List of sources (paths to files) from which units are to be compared to each other.
- Returns:
All unique units in a list.
- Return type:
See also
check_crs()Check that sources share the same CRS.
check_resolution()Check that sources share the same resolution.
check_compatibility()Run all three checks at once.
- riogrande.helper.convert_to_dtype(data, as_dtype=None, in_range=None, out_range=None)[source]#
Converts data to as_dtype and optionally rescales it.
Rescaling is done only if at least one of the ranges is explicitly set. If only in_range is set then the input range is scaled to the full range of the output data type, ad_dtype. This behaviour is typically wanted when converting some floating typed data in a limited range, e.g. [0, 1] to unsigned integer, e.g. uint8, thus mapping the range [0,1] to [0, 255].
In case only out_range is set, the full data type range of the input data is mapped to the provided out_range. This is typically used if converting from a “limited” range, like uint8 to a floating data type.
Note
The default range for any floating type is [0,1]!
This means:
If the output data type, as_dtype is any subclass of np.floating and no out_range is defined then the output is scaled to the intervarl [0, 1].
If data is of any np.floating type and the data range lies withing [0, 1] (and in_range is not provided) then in_range is set to be [0, 1].
- Parameters:
data (NDArray) – Input numpy NDArray
as_dtype (type or str or None) – Desired data type to convert to (e.g. np.float64). If not provided then at least the out_range needs to be set in which case the data type remains unchanges, but the data is rescaled.
in_range (NDArray or Collection or None) –
An array or list from which min and max will be used as input range. Min and max are read with
numpy.nanmin()/numpy.nanmax().Note
You might simply provide the same value as for data in order to use its min an max for scaling
out_range (NDArray or Collection or str or type or None) – An array or list from which min and max will be used as limits for the output. Alternatively, a data type can be specified, in which case the data will be scaled to the full range of the specified data type (see
dtype_range()).
- Returns:
Converted numpy NDArray with desired data type.
- Return type:
NDArray
See also
dtype_range()Get the min/max of a NumPy dtype.
Examples
>>> # simple conversion, no rescaling >>> my_data = np.array([0, 0.5, 1.], dtype=np.float64) >>> convert_to_dtype(my_data, as_dtype='uint8') array([0, 0, 1], dtype=uint8)
>>> # conversion with rescaling specifying in_range only >>> new_data = convert_to_dtype(my_data, as_dtype='uint8', in_range=(0,1)) >>> new_data array([ 0, 127, 255], dtype=uint8)
>>> # convert with scaling specifying out_range only >>> convert_to_dtype(data=new_data, as_dtype='float64', out_range=[-1, 1]) array([-1. , -0.00392157, 1. ])
>>> # only scaling, keeping data type >>> convert_to_dtype(data=my_data, in_range=[0,1], out_range=[-1, 1]) array([-1., 0., 1.])
>>> # scaling with data type as range >>> convert_to_dtype(data=my_data, in_range=[0,1], as_dtype='uint16', out_range='uint8') array([ 0, 127, 255], dtype=uint16)
- riogrande.helper.count_contribution(data, selector, no_data=0)[source]#
The remaining number of data cells when applying the selector
Uses
numpy.unique()withreturn_counts=Trueto count valid cells.- Parameters:
data (NDArray) – The data to cont the contribution in
selector (NDArray) – A boolean array in the shape of data selecting the single cells that should be considered
The value that should be considered as invalid.
Note
You might also provide
numpy.nanas no data value (detected vianumpy.isnan()).
- Returns:
Count of valid cells (pixels in rasterfile).
- Return type:
See also
aggregated_selector()Build a selector from rasterio band masks.
reduced_mask()Build a mask from nodata values across bands.
- riogrande.helper.deserialize(tags)[source]#
Reads python objects from JSON-encoded values of a dict
Each value is parsed using
json.loads().- Parameters:
tags (dict[str, str]) – Dictionary with tag as key and serialized values.
- Returns:
Dictionary with tag as key and deserialized value as value.
- Return type:
Notes
Inverse operation of
serialize().See also
serialize()Convert dict values to JSON strings.
sanitize()Serialize then deserialize in one step.
- riogrande.helper.dtype_range(dtype)[source]#
Get the range of the specified dtype
Uses
numpy.iinfo()for integer types andnumpy.finfo()for floating-point types.Warning
This functions returns min or max as either int or floats.
Be sure to convert them back into dtype if needed!
- Parameters:
dtype (type or str) – A NumPy dtype (e.g.
np.uint8,np.float32) or a string representation thereof (e.g.'uint8').- Returns:
(max, min)of the dtype’s representable range as Pythonintorfloat.- Return type:
- Raises:
ValueError – If dtype has no defined min/max values.
See also
convert_to_dtype()Convert and optionally rescale an array.
- riogrande.helper.get_nbr_workers(number=None)[source]#
Determine the number of worker processes to use in mulitprocessing.
- Parameters:
number (int or None, optional) – Desired number of workers. If
None, the function will use the number of CPUs available viamultiprocessing.cpu_count(), but never less than 2.- Returns:
Number of workers to use (always >= 2).
- Return type:
Notes
A warning is emitted when a requested
numberis lower than 2 and the request is ignored setting the number of used workers to 2.See also
get_or_set_context()Return a multiprocessing context.
- riogrande.helper.get_or_set_context(method=None)[source]#
Return a multiprocessing context and set the global start method if unset.
The function tries to be conservative about changing global interpreter state:
If method is None, it returns a context for the currently configured global start method when one exists; otherwise it warns and returns a context for a sensible default (‘spawn’ is used to establish compatibility with windows).
If method is provided and no global start method is set, it attempts to set the global start method to method. If that attempt races with another thread/process, it falls back to returning a context for method without changing the global start method.
If method is provided and a different global start method is already set, the global start method is not changed; a warning is emitted and a context for the requested method is returned so callers can still create objects using the requested start semantics.
- Parameters:
method ({None, 'fork', 'spawn', 'forkserver'}, optional) –
Desired multiprocessing start method to use for the returned context. If
Nonethe function will:return a context for the currently configured global start method if one exists, or
emit a
RuntimeWarningand return a context for the configured default method (spawn) if no global method is set.
Valid explicit values are
'fork','spawn'and'forkserver'(availability depends on the platform and Python build). Passing an unsupported value raisesValueError.- Returns:
A multiprocessing context object appropriate for creating
multiprocessing.Process,multiprocessing.pool.Pooland related objects. The returned context will use the start method determined by the logic described above. The function always returns a context and never mutates an already-set global start method to a different value.- Return type:
multiprocessing.context.BaseContext
- Raises:
ValueError – If
methodis not one of the supported start methods orNone.RuntimeError – If the function attempts to set the global start method and the call to
multiprocessing.set_start_methodraisesRuntimeErrorfor reasons other than a race (this is rare); in normal race cases the function catches theRuntimeErrorand falls back to returning the requested context.
Notes
Calling
multiprocessing.set_start_methodcan only be done once per interpreter process. Once the global start method is set, it cannot be changed without restarting the interpreter. This function therefore avoids forcibly overwriting an existing different global start method.The returned context is safe to use even when the global start method differs, because context objects encapsulate start semantics for the created processes independently of global state.
On Windows the only available start method is
'spawn'; on Unix-like systems'fork'and'spawn'are commonly available and'forkserver'may be available depending on the platform.Use this helper in library code when you need a guaranteed context but do not want to unconditionally mutate global multiprocessing state.
See also
get_nbr_workers()Determine the number of worker processes.
Examples
>>> ctx = get_or_set_context('spawn') >>> with ctx.Process(target=worker) as p: >>> p.start() >>> p.join()
- riogrande.helper.match_all(targets, tags)[source]#
Check if all tags in targets are present in tags
- Parameters:
- Returns:
True if all tags in targets are present in tags, otherwise False.
- Return type:
See also
match_any()Return True if any tag matches.
- riogrande.helper.match_any(targets, tags)[source]#
Check if any tag in targets is present in tags
- Parameters:
- Returns:
True if any tags in targets are present in tags, otherwise False.
- Return type:
See also
match_all()Return True only if all tags match.
- riogrande.helper.output_filename(base_name, out_type, blur_params=None)[source]#
Construct the filename for the specific output type.
- Parameters:
base_name (str) – The basic output name in the form <name>.tif
out_type (str) – The type of output that will be saved. This should be either ‘blur’ or ‘entropy’ but any string is accepted
blur_params (dict or None) – Output of get_blur_params, so ‘sigma’, ‘truncate’ and ‘diameter’ are expected keys.
- Returns:
The resulting filename of the form ‘<name>_<out_type>_sig_<{sigma}>_diam_<{diameter}>_trunc_<{truncate}>.tif’
- Return type:
- riogrande.helper.rasterio_to_numpy_dtype(rasterio_dtype)[source]#
Map Rasterio actual data types to NumPy data types.
Rasterio types like
rasterio.dtypes.int16,rasterio.dtypes.float32are mapped to their NumPy equivalents.- Parameters:
rasterio_dtype (str) – Output of
rasterio.open(source).profile['dtype'], as returned byrasterio.open().- Returns:
Data type as
numpy.dtype, orNoneif the type is unknown.- Return type:
numpy.dtype or None
- riogrande.helper.reduced_mask(array, nodata=0, logic='all')[source]#
Computes a mask based on the value of several bands
- Parameters:
array (NDArray) – 3D array holding multiple bands of map data
nodata (float or int or None) – Nodata value to use. Defaults to 0. Pass
numpy.nanto mask NaN cells (detected vianumpy.isnan()).logic (str) –
Allowed strings are:
"all": Masked will be each cell for which all bands match the nodata value (aggregated vianumpy.logical_or()across bands)."any": Masked will be each cell for which any band matches the nodata value (aggregated vianumpy.logical_and()across bands).
- Returns:
Boolean numpy array resulting from applied logic.
- Return type:
NDArray
See also
aggregated_selector()Aggregate rasterio band masks into a selector.
Examples
>>> mydata = np.array([[[2, 4], [0, 1]], [[5, 5], [1, 0]]]) >>> # only mask if all are nodata >>> reduced_mask(mydata) array([[1, 1], [1, 1]], dtype=uint8) >>> # mask if any are nodata >>> reduced_mask(mydata, logic='any') array([[1, 1], [0, 0]], dtype=uint8)
- riogrande.helper.sanitize(tags)[source]#
Serializes then deserializes values of a dict
Convenience wrapper that calls
serialize()followed bydeserialize(), ensuring values are in the same form they would be when loaded back from a.tiftag.- Parameters:
tags (dict[str, Any]) – Dictionary with tag as key and serializable value as value.
- Returns:
Dictionary with tag as key and deserialized value as value.
- Return type:
See also
serialize()Convert dict values to JSON strings.
deserialize()Parse JSON strings back to Python objects.
- riogrande.helper.serialize(tags)[source]#
Convert the values of a dict into JSON
Each value is serialized using
json.dumps().- Parameters:
tags (dict[str, Any]) – Dictionary of tags with string keywords and any-type values, which are serializable.
- Returns:
Dictionary with tag as key and serialized value as value.
- Return type:
See also
deserialize()Inverse operation; parse JSON back to Python objects.
sanitize()Serialize then deserialize in one step.