convster.processing =================== .. py:module:: convster.processing .. autoapi-nested-parse:: Data preparation and derived metric computation for categorical raster maps. This module provides functions for extracting, filtering, and transforming data from categorical raster maps, as well as for computing per-cell derived metrics. It is the primary processing layer of the ``convster`` package. Key functionality includes: - **Category handling**: Selecting pixels by category value, listing available categories, and extracting per-category data arrays (:func:`select_category`, :func:`get_categories`, :func:`get_category_data`). - **Filter application**: Applying arbitrary image filters per category with optional output rescaling (:func:`get_filtered_categories`, :func:`get_category_data`). - **Entropy computation**: Computing per-cell Shannon entropy across category probability layers (:func:`compute_entropy`, :func:`get_max_entropy`, :func:`get_entropy_view`). - **Interaction metrics**: Computing per-cell multi-layer interaction values normalised by the theoretical maximum (:func:`view_interaction`). - **Visualisation helpers**: Generating entropy and interaction arrays ready for display or export (:func:`view_entropy`, :func:`view_interaction`). Functions --------- .. autoapisummary:: convster.processing.compute_entropy convster.processing.compute_interaction convster.processing.get_categories convster.processing.get_category_data convster.processing.get_entropy_view convster.processing.get_filtered_categories convster.processing.get_max_entropy convster.processing.select_category convster.processing.view_blurred convster.processing.view_entropy convster.processing.view_interaction Module Contents --------------- .. py:function:: compute_entropy(data_arrays, normed = True, max_entropy_categories = None, as_dtype = None, output_range = None, **entropy_params) Compute per-cell entropy over a series of data arrays. The input arrays are stacked along a new axis, and entropy is calculated for each cell. The resulting array can be normalized, converted to a different dtype, and rescaled to a specified output range. :param data_arrays: Sequence of arrays to compute per-cell entropy over. All arrays must have the same shape. :type data_arrays: Sequence[NDArray] :param normed: If True (default), entropy values are normalized according to the maximum possible entropy. If False, the raw entropy is returned without rescaling. :type normed: bool :param max_entropy_categories: Maximum number of categories to use for normalization when `normed=True`. Ignored if `normed=False`. :type max_entropy_categories: int or None :param as_dtype: Data type for the output array. Useful to reduce memory usage when `normed=True`. :type as_dtype: type or str or None :param output_range: Range to rescale normalized entropy values. Ignored if `normed=False`. :type output_range: tuple or None :param \*\*entropy_params: Additional keyword arguments passed to :func:`scipy.stats.entropy`. :type \*\*entropy_params: dict :returns: Array of the same shape as the input arrays, containing the per-cell entropy. :rtype: NDArray .. rubric:: Notes - When `normed=True`, the entropy is mapped to [0, 1] for float outputs by default, or to the full range of the specified integer type if `as_dtype` is integer. - Converting to a different dtype without normalization may produce unbounded results. - For large arrays, using a smaller `as_dtype` (e.g., 'uint8') can save memory. - Normalization uses :func:`get_max_entropy` to determine the maximum entropy given the number of input arrays, and :func:`~riogrande.helper.convert_to_dtype` for rescaling. .. seealso:: :func:`get_max_entropy` Compute the maximum entropy for a given number of categories. :func:`_get_entropy` Internal wrapper combining blurring and entropy computation. .. rubric:: Examples >>> data1 = np.array([[10, 5], ... [4, 1]]) >>> data2 = np.array([[1, 5], ... [2, 9]]) >>> compute_entropy([data1, data2], normed=True, as_dtype='float32') array([[0.439497 , 1. ], [0.91829586, 0.4689956 ]], dtype=float32) .. py:function:: compute_interaction(data_arrays, input_dtype = None, standardize = False, normed = True, output_dtype = None, output_range = None) Compute per-cell interaction (inspired by the Simpson Index) across a series of data arrays. The interaction is calculated as the element-wise product of the input arrays. Optionally, the interaction can be standardized, normalized, and converted to a specified output data type. For float inputs: .. math:: interaction = LC_1 \times LC_2 \times \dots \times LC_n For integer (e.g., uint8) inputs: .. math:: interaction = \frac{\left(\frac{LC_1}{\text{max}} \times \frac{LC_2}{\text{max}} \times \dots \right)}{(1/n^n)} \times \text{max} :param data_arrays: Sequence of arrays to compute per-cell interaction over. All arrays must have the same shape. :type data_arrays: Sequence[NDArray] :param input_dtype: Expected data type of the input arrays. Raises an error if actual dtype does not match. :type input_dtype: type or str or None, default=None :param standardize: If True, interaction is standardized by the sum of the layers: :math:`interaction = \frac{A \cdot B \cdot ...}{A + B + ...}`. :type standardize: bool, default=False :param normed: If True, interaction values are normalized to the theoretical maximum interaction: :math:`(1/n)^n` for n arrays. :type normed: bool, default=True :param output_dtype: Data type for the output array. Values are rescaled appropriately for integer outputs. :type output_dtype: type or str or None, default=None :param output_range: Target range for output values (currently used only for integer outputs; reserved for future use). :type output_range: tuple or None, default=None :returns: Array of the same shape as the input arrays, containing the per-cell interaction. :rtype: NDArray .. rubric:: Notes - Standardization (`standardize=True`) scales the interaction by the sum of input layers. - Conversion to integer types uses scaling and :func:`numpy.ceil` to avoid rounding artifacts. - `normed=True` ensures the maximum possible interaction corresponds to 1 (float) or the maximum of the integer. - Input/output range detection uses :func:`~riogrande.helper.dtype_range`. .. seealso:: :func:`compute_entropy` Compute per-cell entropy over a series of data arrays. .. rubric:: Examples Example 1: float inputs, 2 arrays >>> a = np.array([[0.5, 0.25], [0.0, 0.05]]) >>> b = np.array([[0.5, 0.25], [0.0, 0.3]]) >>> compute_interaction([a, b], standardize=True, normed=True) array([[1. , 0.5 ], [0. , 0.17142857]]) >>> compute_interaction([a, b], standardize=True, normed=False) array([[0.25 , 0.125 ], [0. , 0.04285714]]) Example 2: integer inputs (uint8), 3 arrays, float output >>> a = np.array([[85, 100], [50, 60]], dtype=np.uint8) >>> b = np.array([[85, 50], [100, 80]], dtype=np.uint8) >>> c = np.array([[85, 105], [100, 10]], dtype=np.uint8) >>> compute_interaction([a, b, c], standardize=True, normed=True, output_dtype=np.float64) array([[1. , 0.85487482], [0.83044983, 0.13287197]]) .. py:function:: get_categories(data) Return the sorted list of categories present in the data. :param data: Array of integers indicating the category of each pixel. :type data: NDArray :returns: Sorted list of unique categories present in the data. Uses :func:`numpy.unique` to determine unique values. :rtype: list of int .. seealso:: :func:`select_category` Filter an array for a specific category. :func:`get_filtered_categories` Extract all categories into separate arrays. .. rubric:: Examples >>> a = np.array([[0, 1, 2], ... [2, 1, 0], ... [1, 0, 2]]) >>> get_categories(a) [0, 1, 2] .. py:function:: get_category_data(data, category, img_filter = None, filter_params = None, filter_output_range = None, as_dtype = None, output_range = None, data_as_dtype = 'uint8') Return the data of one or more categories, optionally after applying a filter. :param data: Matrix indicating the per-cell category. :type data: NDArray :param category: The category (or categories) to extract. :type category: int or list[int] :param img_filter: Filter function applied to the selected category data (e.g., ``skimage.filters.gaussian``). :type img_filter: Callable or None :param filter_params: Parameters passed to ``img_filter``. :type filter_params: dict or None :param filter_output_range: Output value range to apply after filtering, if applicable. :type filter_output_range: tuple or None :param as_dtype: Desired data type of the output array. :type as_dtype: type or str or None :param output_range: Custom value range ``(min, max)`` to which the output will be scaled. Useful when filters produce floating-point values. For example, a Gaussian filter returns a ``float64`` array with values in ``[0, 1]``. With ``as_dtype="uint8"``, these values are mapped to ``[0, 255]``, reducing memory usage. :type output_range: tuple or None :param data_as_dtype: Data type of the array used to encode the category mask before filtering. Default is ``"uint8"``. For datasets with more than 255 categories, ``"uint16"`` may be more appropriate. :type data_as_dtype: type or str :returns: * *Filtered or unfiltered array of the selected category, converted and* * *scaled according to the specified options.* .. rubric:: Notes - If no image filter is provided, either ``as_dtype`` or ``output_range`` must be set to define the data type or range of the output array. - If an image filter is provided, ``as_dtype`` converts the data before the filter is applied. .. seealso:: :func:`select_category` Create a binary indicator array for a category. :func:`get_filtered_categories` Apply this function across all categories. :func:`_filter_data` Apply a filter and rescale the resulting data. .. py:function:: get_entropy_view(source, view, inner_view, categories, img_filter, filter_params = dict(), max_entropy_categories = None, blur_as_int = None, filter_output_range = None, blur_output_dtype = None, output_dtype = None, output_range = None, normed = True, **tags) Returns the entropy for some categories over a view from a tif file .. warning:: This function is deprecated and should not be used :param source: Path to the TIFF file to load. :type source: str :param view: A 4-tuple defining the view of the data array to update: `(start_row, end_row, start_col, end_col)`. :type view: tuple of int :param inner_view: A 4-tuple defining the inner part of the view, excluding border effects. :type inner_view: tuple of int :param categories: A collection of category values to extract. If `None`, all categories are processed. :type categories: Collection, optional :param img_filter: A function that will be applied to each category indicator array. :type img_filter: Callable :param filter_params: Keyword arguments to pass to `img_filter`. Default is an empty dictionary. :type filter_params: dict, optional :param filter_output_range: Output range for the filtered arrays. If `None`, no explicit rescaling is applied. :type filter_output_range: tuple, optional :param view: A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)). :type view: tuple[int, int, int, int] :param max_entropy_categories: If normed is true, this determines the maximum n for Entropy to be used to caluclate the maximum to norm by. This argument is ignored if `normed=False`. :type max_entropy_categories: int or None :param output_dtype: Data type for the returned entropy array. If None, the dtype is inferred. :type output_dtype: type or str or None :param output_range: The data-range to use for the returned array. .. note:: This argument is only taken into account if `normed=True`. :type output_range: tuple or None :param normed: If True, normalize the entropy values to the range [0, 1] using the maximum possible entropy determined by `max_entropy_categories`. :type normed: bool :param \*\*tags: Arbitrary number of keyword arguments to describe the band to select. See :func:`~riogrande.io.load_block` for further details. :type \*\*tags: dict .. seealso:: :func:`view_blurred` Compute blurred binary arrays per category. :func:`view_entropy` Compute per-cell entropy for a set of category arrays. .. py:function:: get_filtered_categories(data, categories = None, img_filter = None, output_dtype = 'uint8', output_range = None, filter_output_range = None, filter_params = None) Extract each category from a data array into separate arrays and optionally apply a filter. :param data: Array containing integer categories, e.g., a land-cover type matrix. :type data: NDArray :param categories: Collection of categories to extract. If None, all categories in `data` are extracted. :type categories: Collection or None :param img_filter: Callable to apply as a filter to each category array (e.g., `skimage.filters.gaussian`). :type img_filter: Callable or None :param output_dtype: Data type for the returned arrays (default: "uint8"). :type output_dtype: type or str or None :param output_range: Range to rescale the filtered arrays. :type output_range: tuple or None :param filter_output_range: Expected output range of the filter for proper scaling. :type filter_output_range: tuple or None :param filter_params: Dictionary of parameters to pass to the filter function. :type filter_params: dict or None :returns: A dictionary mapping each category to its filtered and optionally rescaled array. :rtype: dict .. rubric:: Notes - See :func:`get_category_data` for details on extracting category-specific data. .. seealso:: :func:`get_category_data` Extract and optionally filter data for one category. :func:`get_categories` Infer the list of categories from an array. .. py:function:: get_max_entropy(nbr) Maximum entropy value for a given number of categories. The maximum Shannon entropy occurs when the distribution is uniform across `nbr` categories, i.e. all categories have equal probability. :param nbr: The number of categories. :type nbr: int :returns: The maximal entropy for a uniform distribution with `nbr` categories. Computed using :func:`scipy.stats.entropy` with a uniform distribution. :rtype: float .. seealso:: :func:`compute_entropy` Compute per-cell entropy over a series of data arrays. .. rubric:: Examples >>> get_max_entropy(2) 0.6931471805599453 >>> get_max_entropy(10) 2.302585092994046 .. py:function:: select_category(data, category, as_dtype = 'uint8', limits = None) Filter an array for particular category or categories. :param data: Input matrix of integers indicating the category of each pixel. :type data: NDArray :param category: The category (or list of categories) to select. :type category: int or list[int] :param as_dtype: Data type of the output matrix. .. note:: The output matrix will contain the maximal value possible for this data type in cells that match `category`, and the minimal value in all other cells. :type as_dtype: type or str :param limits: Custom limits for output values. Must be a pair `(is_value, is_not_value)`. If provided, these override the default min/max values inferred from `as_dtype`. :type limits: tuple or None :returns: Matrix of type `as_dtype` with the same shape as `data`. :rtype: NDArray .. seealso:: :func:`get_categories` Infer the list of categories from an array. :func:`get_category_data` Extract and optionally filter data for one or more categories. .. rubric:: Examples >>> data = np.array([ ... [0, 1, 2], ... [2, 1, 0], ... [1, 0, 2] ... ]) >>> select_category(data, category=1) array([[ 0, 255, 0], [ 0, 255, 0], [255, 0, 0]], dtype=uint8) >>> select_category(data, category=[1, 2], as_dtype="int16", limits=(1000, -1000)) array([[-1000, 1000, 1000], [ 1000, 1000, -1000], [ 1000, -1000, 1000]], dtype=int16) .. py:function:: view_blurred(source, view, inner_view, categories, img_filter, filter_params = dict(), filter_output_range = None, output_dtype = 'uint8', output_range = None, **tags) Compute blurred binary arrays for each category in a categorical TIFF file. The provided TIFF file must contain at least one band with categorical data (e.g., of type `uint`). For each specified category, an indicator array is created (dichotomous array marking presence/absence of that category), which is then filtered using the provided `img_filter` function. .. note:: This method will be moved to the `parallel` sub-module in a future release. :param source: Path to the TIFF file to load. :type source: str :param view: A 4-tuple defining the view of the data array to update: `(start_row, end_row, start_col, end_col)`. :type view: tuple of int :param inner_view: A 4-tuple defining the inner part of the view, excluding border effects. :type inner_view: tuple of int :param categories: A collection of category values to extract. If `None`, all categories are processed. :type categories: Collection, optional :param img_filter: A function that will be applied to each category indicator array. :type img_filter: Callable :param filter_params: Keyword arguments to pass to `img_filter`. Default is an empty dictionary. :type filter_params: dict, optional :param filter_output_range: Output range for the filtered arrays. If `None`, no explicit rescaling is applied. :type filter_output_range: tuple, optional :param output_dtype: Data type for the returned arrays. Default is `"uint8"`. .. note:: If provided, the output of the filter function will be rescaled to the range of this data type. See `get_category_data` for details. :type output_dtype: type or str, optional :param output_range: Explicit output range for the filtered arrays. If not provided and the filter produces float-type data, the range `[0, 1]` is assumed, with values clipped to this range. :type output_range: tuple, optional :param \*\*tags: Arbitrary keyword arguments to specify which band to read from the TIFF file. See :func:`~riogrande.io.load_block` for further details. :type \*\*tags: keyword arguments :returns: Dictionary with keys: - ``'data'``: a dictionary mapping each category to its blurred array - ``'view'``: the `inner_view` defining the effective area of the returned arrays :rtype: dict .. seealso:: :func:`view_entropy` Compute per-cell entropy for a set of category arrays. :func:`view_interaction` Compute per-cell interaction for a set of category arrays. :func:`get_filtered_categories` Extract all categories with optional filtering. .. py:function:: view_entropy(category_arrays, view, normed = True, max_entropy_categories = None, output_dtype = None, output_range = None) Compute the per-cell entropy for a set of category arrays within a specified view. :param category_arrays: Dictionary mapping category indices to their corresponding arrays. :type category_arrays: dict[int, NDArray] :param view: A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)). :type view: tuple[int, int, int, int] :param normed: If True, normalize the entropy values to the range [0, 1] using the maximum possible entropy determined by `max_entropy_categories`. :type normed: bool :param max_entropy_categories: The maximum number of categories used for normalization. Ignored if `normed=False`. :type max_entropy_categories: int or None :param output_dtype: Data type for the returned entropy array. If None, the dtype is inferred. :type output_dtype: type or str or None :param output_range: Range to scale the output values to, e.g., (0, 1). :type output_range: tuple or None :returns: Dictionary with keys: - ``'data'``: NDArray of computed entropy values for the specified view. - ``'view'``: tuple defining the original view of the data arrays. :rtype: dict .. seealso:: :func:`view_blurred` Compute blurred binary arrays per category. :func:`view_interaction` Compute per-cell interaction for a set of category arrays. :func:`compute_entropy` Underlying entropy computation function. .. py:function:: view_interaction(category_arrays, view, input_dtype = np.uint8, standardize = False, normed = True, output_dtype = None, output_range = None) Compute the per-cell interaction metric for a set of category arrays within a specified view. The function returns a dictionary containing the computed interaction array and the original view. Interaction values can be standardized, normalized, and returned in a specific data type or range. :param category_arrays: Dictionary mapping category indices to their corresponding arrays. :type category_arrays: dict[int, NDArray] :param view: A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)). :type view: tuple[int, int, int, int] :param input_dtype: Data type for input conversion before computing interactions. :type input_dtype: type or str or None :param standardize: If True, standardize the input arrays before computing interaction. :type standardize: bool :param normed: If True, normalize the computed interaction values. :type normed: bool :param output_dtype: Data type for the returned interaction array. If None, the dtype is inferred. :type output_dtype: type or str or None :param output_range: Range to scale the output values to, e.g., (0, 1). :type output_range: tuple or None :returns: Dictionary with keys: - ``'data'``: NDArray of computed interaction values for the specified view. - ``'view'``: tuple defining the original view of the data arrays. :rtype: dict .. seealso:: :func:`view_blurred` Compute blurred binary arrays per category. :func:`view_entropy` Compute per-cell entropy for a set of category arrays. :func:`compute_interaction` Underlying interaction computation function.