riogrande.helper ================ .. py:module:: riogrande.helper .. autoapi-nested-parse:: General-purpose helper functions for the riogrande package. This module collects utility functions that are used across the package but do not belong to the I/O layer or the parallelization machinery. It covers: - **Compatibility checks**: CRS, spatial resolution, and unit validation across multiple raster sources (:func:`check_compatibility`, :func:`check_crs`, :func:`check_resolution`, :func:`check_units`). - **Dtype conversion**: Converting array values between numeric types with optional range rescaling (:func:`convert_to_dtype`, :func:`dtype_range`). - **Tag utilities**: Serializing, deserializing, sanitizing, and matching metadata tag dictionaries (:func:`serialize`, :func:`deserialize`, :func:`sanitize`, :func:`match_all`, :func:`match_any`). - **Mask aggregation**: Combining boolean selector arrays with logical AND/OR (:func:`aggregated_selector`, :func:`reduced_mask`). - **Multiprocessing setup**: Obtaining a multiprocessing context and determining the number of worker processes (:func:`get_or_set_context`, :func:`get_nbr_workers`). - **Miscellaneous**: Output filename generation, window-to-view conversion, and pixel contribution counting. Attributes ---------- .. autoapisummary:: riogrande.helper.MPC_STARTER_METHODS Functions --------- .. autoapisummary:: riogrande.helper.aggregated_selector riogrande.helper.check_compatibility riogrande.helper.check_crs riogrande.helper.check_resolution riogrande.helper.check_units riogrande.helper.convert_to_dtype riogrande.helper.count_contribution riogrande.helper.deserialize riogrande.helper.dtype_range riogrande.helper.get_nbr_workers riogrande.helper.get_or_set_context riogrande.helper.match_all riogrande.helper.match_any riogrande.helper.output_filename riogrande.helper.rasterio_to_numpy_dtype riogrande.helper.reduced_mask riogrande.helper.sanitize riogrande.helper.serialize riogrande.helper.view_to_window Module Contents --------------- .. py:function:: aggregated_selector(masks, logic = 'all') Turns several rasterio masks into a boolen selector for a numpy array Rasterio masks are uint8 numpy arrays where every value > 0 is considered a valid cell :param masks: Arbitrary number of numpy arrays resulting from :meth:`rasterio.io.DatasetReader.dataset_mask` or :meth:`rasterio.io.DatasetReader.read_masks`. :type masks: list[NDArray] :param logic: Determines how the aggreagation should happen. If ``'all'`` (the default) a cell is only selected if **all** masks consider it valid data — aggregated via :func:`numpy.logical_and`. ``'any'`` selects cells which **at least one** mask considers valid — aggregated via :func:`numpy.logical_or`. :type logic: str :returns: Boolean numpy array as result of logical mask applied. :rtype: NDArray .. seealso:: :func:`~riogrande.helper.reduced_mask` Compute a mask from nodata values across bands. .. py:function:: check_compatibility(*sources) Assert that all the sources are compatible with each other. The checks include: - crs (via :func:`~riogrande.helper.check_crs`) - units (via :func:`~riogrande.helper.check_units`) - resolution (via :func:`~riogrande.helper.check_resolution`) :param \*sources: List of sources (paths to files) from which are to be compared to each other. :type \*sources: str :returns: * **crss** (*list*) -- All unique crs from sources in a list (see :func:`~riogrande.helper.check_crs`). * **units** (*list*) -- All unique units from sources in a list (see :func:`~riogrande.helper.check_units`). * **ress** (*list*) -- All unique resolutions from sources in a list (see :func:`~riogrande.helper.check_resolution`). .. seealso:: :func:`~riogrande.helper.check_crs` Check that sources share the same CRS. :func:`~riogrande.helper.check_units` Check that sources share the same linear units. :func:`~riogrande.helper.check_resolution` Check that sources share the same resolution. .. py:function:: check_crs(*sources) Assert that all the sources have the same coordinate reference system (crs) :param \*sources: List of sources (paths to files) from which crs are to be compared to each other. :type \*sources: str :returns: All unique crs from sources in a list. :rtype: list .. seealso:: :func:`~riogrande.helper.check_units` Check that sources share the same linear units. :func:`~riogrande.helper.check_resolution` Check that sources share the same resolution. :func:`~riogrande.helper.check_compatibility` Run all three checks at once. .. py:function:: check_resolution(*sources) Assert that all the sources have the same spatial resolution :param \*sources: List of sources (paths to files) from which resolutions are to be compared to each other. :type \*sources: str :returns: All unique resolutions from sources in a list. :rtype: list .. seealso:: :func:`~riogrande.helper.check_units` Check that sources share the same linear units. :func:`~riogrande.helper.check_crs` Check that sources share the same CRS. :func:`~riogrande.helper.check_compatibility` Run all three checks at once. .. py:function:: check_units(*sources) Assert that all sources have the same linear units in the coordinate reference system (crs) :param \*sources: List of sources (paths to files) from which units are to be compared to each other. :type \*sources: str :returns: All unique units in a list. :rtype: list .. seealso:: :func:`~riogrande.helper.check_crs` Check that sources share the same CRS. :func:`~riogrande.helper.check_resolution` Check that sources share the same resolution. :func:`~riogrande.helper.check_compatibility` Run all three checks at once. .. py:function:: convert_to_dtype(data, as_dtype = None, in_range = None, out_range = None) Converts data to `as_dtype` and optionally rescales it. Rescaling is done only if at least one of the ranges is explicitly set. If only `in_range` is set then the input range is scaled to the full range of the output data type, `ad_dtype`. This behaviour is typically wanted when converting some floating typed data in a limited range, e.g. [0, 1] to unsigned integer, e.g. `uint8`, thus mapping the range [0,1] to [0, 255]. In case only `out_range` is set, the full data type range of the input data is mapped to the provided `out_range`. This is typically used if converting from a "limited" range, like `uint8` to a floating data type. .. note:: The default range for any floating type is `[0,1]`! This means: - If the output data type, `as_dtype` is any subclass of `np.floating` and no `out_range` is defined then the output is scaled to the intervarl `[0, 1]`. - If data is of any `np.floating` type and the data range lies withing `[0, 1]` (and `in_range` is not provided) then `in_range` is set to be `[0, 1]`. :param data: Input numpy NDArray :type data: NDArray :param as_dtype: Desired data type to convert to (e.g. np.float64). If not provided then at least the `out_range` needs to be set in which case the data type remains unchanges, but the data is rescaled. :type as_dtype: type or str or None :param in_range: An array or list from which min and max will be used as input range. Min and max are read with :func:`numpy.nanmin` / :func:`numpy.nanmax`. .. note:: You might simply provide the same value as for `data` in order to use its min an max for scaling :type in_range: NDArray or Collection or None :param out_range: An array or list from which min and max will be used as limits for the output. Alternatively, a data type can be specified, in which case the data will be scaled to the full range of the specified data type (see :func:`~riogrande.helper.dtype_range`). :type out_range: NDArray or Collection or str or type or None :returns: Converted numpy NDArray with desired data type. :rtype: NDArray .. seealso:: :func:`~riogrande.helper.dtype_range` Get the min/max of a NumPy dtype. .. rubric:: Examples >>> # simple conversion, no rescaling >>> my_data = np.array([0, 0.5, 1.], dtype=np.float64) >>> convert_to_dtype(my_data, as_dtype='uint8') array([0, 0, 1], dtype=uint8) >>> # conversion with rescaling specifying in_range only >>> new_data = convert_to_dtype(my_data, as_dtype='uint8', in_range=(0,1)) >>> new_data array([ 0, 127, 255], dtype=uint8) >>> # convert with scaling specifying out_range only >>> convert_to_dtype(data=new_data, as_dtype='float64', out_range=[-1, 1]) array([-1. , -0.00392157, 1. ]) >>> # only scaling, keeping data type >>> convert_to_dtype(data=my_data, in_range=[0,1], out_range=[-1, 1]) array([-1., 0., 1.]) >>> # scaling with data type as range >>> convert_to_dtype(data=my_data, in_range=[0,1], as_dtype='uint16', out_range='uint8') array([ 0, 127, 255], dtype=uint16) .. py:function:: count_contribution(data, selector, no_data = 0) The remaining number of data cells when applying the selector Uses :func:`numpy.unique` with ``return_counts=True`` to count valid cells. :param data: The data to cont the contribution in :type data: NDArray :param selector: A boolean array in the shape of `data` selecting the single cells that should be considered :type selector: NDArray :param no_data: The value that should be considered as invalid. .. note:: You might also provide :data:`numpy.nan` as no data value (detected via :func:`numpy.isnan`). :type no_data: int or float :returns: Count of valid cells (pixels in rasterfile). :rtype: int .. seealso:: :func:`~riogrande.helper.aggregated_selector` Build a selector from rasterio band masks. :func:`~riogrande.helper.reduced_mask` Build a mask from nodata values across bands. .. py:function:: deserialize(tags) Reads python objects from JSON-encoded values of a dict Each value is parsed using :func:`json.loads`. :param tags: Dictionary with tag as key and serialized values. :type tags: dict[str, str] :returns: Dictionary with tag as key and deserialized value as value. :rtype: dict .. rubric:: Notes Inverse operation of :func:`~riogrande.helper.serialize`. .. seealso:: :func:`~riogrande.helper.serialize` Convert dict values to JSON strings. :func:`~riogrande.helper.sanitize` Serialize then deserialize in one step. .. py:function:: dtype_range(dtype) Get the range of the specified dtype Uses :func:`numpy.iinfo` for integer types and :func:`numpy.finfo` for floating-point types. .. warning:: This functions returns min or max as either `int` or `floats`. Be sure to convert them back into `dtype` if needed! :param dtype: A NumPy dtype (e.g. ``np.uint8``, ``np.float32``) or a string representation thereof (e.g. ``'uint8'``). :type dtype: type or str :returns: ``(max, min)`` of the dtype's representable range as Python ``int`` or ``float``. :rtype: tuple :raises ValueError: If `dtype` has no defined min/max values. .. seealso:: :func:`~riogrande.helper.convert_to_dtype` Convert and optionally rescale an array. .. py:function:: get_nbr_workers(number = None) Determine the number of worker processes to use in mulitprocessing. :param number: Desired number of workers. If ``None``, the function will use the number of CPUs available via :func:`multiprocessing.cpu_count`, but never less than 2. :type number: int or None, optional :returns: Number of workers to use (always `>= 2`). :rtype: int .. rubric:: Notes A warning is emitted when a requested ``number`` is lower than 2 and the request is ignored setting the number of used workers to 2. .. seealso:: :func:`~riogrande.helper.get_or_set_context` Return a multiprocessing context. .. py:function:: get_or_set_context(method = None) Return a multiprocessing context and set the global start method if unset. The function tries to be conservative about changing global interpreter state: - If `method` is None, it returns a context for the currently configured global start method when one exists; otherwise it warns and returns a context for a sensible default ('spawn' is used to establish compatibility with windows). - If `method` is provided and no global start method is set, it attempts to set the global start method to `method`. If that attempt races with another thread/process, it falls back to returning a context for `method` without changing the global start method. - If `method` is provided and a different global start method is already set, the global start method is not changed; a warning is emitted and a context for the requested `method` is returned so callers can still create objects using the requested start semantics. :param method: Desired multiprocessing start method to use for the returned context. If ``None`` the function will: - return a context for the currently configured global start method if one exists, or - emit a ``RuntimeWarning`` and return a context for the configured default method (``spawn``) if no global method is set. Valid explicit values are ``'fork'``, ``'spawn'`` and ``'forkserver'`` (availability depends on the platform and Python build). Passing an unsupported value raises ``ValueError``. :type method: {None, 'fork', 'spawn', 'forkserver'}, optional :returns: A multiprocessing context object appropriate for creating :class:`multiprocessing.Process`, :class:`multiprocessing.pool.Pool` and related objects. The returned context will use the start method determined by the logic described above. The function always returns a context and never mutates an already-set global start method to a different value. :rtype: multiprocessing.context.BaseContext :raises ValueError: If ``method`` is not one of the supported start methods or ``None``. :raises RuntimeError: If the function attempts to set the global start method and the call to ``multiprocessing.set_start_method`` raises ``RuntimeError`` for reasons other than a race (this is rare); in normal race cases the function catches the ``RuntimeError`` and falls back to returning the requested context. .. rubric:: Notes - Calling ``multiprocessing.set_start_method`` can only be done once per interpreter process. Once the global start method is set, it cannot be changed without restarting the interpreter. This function therefore avoids forcibly overwriting an existing different global start method. - The returned context is safe to use even when the global start method differs, because context objects encapsulate start semantics for the created processes independently of global state. - On Windows the only available start method is ``'spawn'``; on Unix-like systems ``'fork'`` and ``'spawn'`` are commonly available and ``'forkserver'`` may be available depending on the platform. - Use this helper in library code when you need a guaranteed context but do not want to unconditionally mutate global multiprocessing state. .. seealso:: :func:`~riogrande.helper.get_nbr_workers` Determine the number of worker processes. .. rubric:: Examples >>> ctx = get_or_set_context('spawn') >>> with ctx.Process(target=worker) as p: >>> p.start() >>> p.join() .. py:function:: match_all(targets, tags) Check if all tags in targets are present in tags :param targets: Dictionary with tags to match to. :type targets: dict :param tags: Dictionary with tags to check for matching items. :type tags: dict :returns: True if all tags in targets are present in tags, otherwise False. :rtype: bool .. seealso:: :func:`~riogrande.helper.match_any` Return True if *any* tag matches. .. py:function:: match_any(targets, tags) Check if any tag in targets is present in tags :param targets: Dictionary with tags to match to. :type targets: dict :param tags: Dictionary with tags to check for matching items. :type tags: dict :returns: True if any tags in targets are present in tags, otherwise False. :rtype: bool .. seealso:: :func:`~riogrande.helper.match_all` Return True only if *all* tags match. .. py:function:: output_filename(base_name, out_type, blur_params = None) Construct the filename for the specific output type. :param base_name: The basic output name in the form .tif :type base_name: str :param out_type: The type of output that will be saved. This should be either 'blur' or 'entropy' but any string is accepted :type out_type: str :param blur_params: Output of `get_blur_params`, so 'sigma', 'truncate' and 'diameter' are expected keys. :type blur_params: dict or None :returns: The resulting filename of the form '__sig_<{sigma}>_diam_<{diameter}>_trunc_<{truncate}>.tif' :rtype: str .. py:function:: rasterio_to_numpy_dtype(rasterio_dtype) Map Rasterio actual data types to NumPy data types. Rasterio types like ``rasterio.dtypes.int16``, ``rasterio.dtypes.float32`` are mapped to their NumPy equivalents. :param rasterio_dtype: Output of ``rasterio.open(source).profile['dtype']``, as returned by :func:`rasterio.open`. :type rasterio_dtype: str :returns: Data type as :class:`numpy.dtype`, or ``None`` if the type is unknown. :rtype: numpy.dtype or None .. py:function:: reduced_mask(array, nodata = 0, logic = 'all') Computes a mask based on the value of several bands :param array: 3D array holding multiple bands of map data :type array: NDArray :param nodata: Nodata value to use. Defaults to 0. Pass :data:`numpy.nan` to mask NaN cells (detected via :func:`numpy.isnan`). :type nodata: float or int or None :param logic: Allowed strings are: - ``"all"`` : Masked will be each cell for which **all** bands match the nodata value (aggregated via :func:`numpy.logical_or` across bands). - ``"any"`` : Masked will be each cell for which **any** band matches the nodata value (aggregated via :func:`numpy.logical_and` across bands). :type logic: str :returns: Boolean numpy array resulting from applied logic. :rtype: NDArray .. seealso:: :func:`~riogrande.helper.aggregated_selector` Aggregate rasterio band masks into a selector. .. rubric:: Examples >>> mydata = np.array([[[2, 4], [0, 1]], [[5, 5], [1, 0]]]) >>> # only mask if all are nodata >>> reduced_mask(mydata) array([[1, 1], [1, 1]], dtype=uint8) >>> # mask if any are nodata >>> reduced_mask(mydata, logic='any') array([[1, 1], [0, 0]], dtype=uint8) .. py:function:: sanitize(tags) Serializes then deserializes values of a dict Convenience wrapper that calls :func:`~riogrande.helper.serialize` followed by :func:`~riogrande.helper.deserialize`, ensuring values are in the same form they would be when loaded back from a ``.tif`` tag. :param tags: Dictionary with tag as key and serializable value as value. :type tags: dict[str, Any] :returns: Dictionary with tag as key and deserialized value as value. :rtype: dict .. seealso:: :func:`~riogrande.helper.serialize` Convert dict values to JSON strings. :func:`~riogrande.helper.deserialize` Parse JSON strings back to Python objects. .. py:function:: serialize(tags) Convert the values of a dict into JSON Each value is serialized using :func:`json.dumps`. :param tags: Dictionary of tags with string keywords and any-type values, which are serializable. :type tags: dict[str, Any] :returns: Dictionary with tag as key and serialized value as value. :rtype: dict .. seealso:: :func:`~riogrande.helper.deserialize` Inverse operation; parse JSON back to Python objects. :func:`~riogrande.helper.sanitize` Serialize then deserialize in one step. .. py:function:: view_to_window(view) Conerts a view into a rasterio Window :param view: tuple (x, y, width, height) defining the view of the data array to update :type view: tuple[int, int, int, int] or None :returns: Rasterio window object, or ``None`` if `view` is ``None``. :rtype: :class:`rasterio.windows.Window` .. py:data:: MPC_STARTER_METHODS :value: ['spawn', 'fork', 'forkserver']