convster.processing#
Data preparation and derived metric computation for categorical raster maps.
This module provides functions for extracting, filtering, and transforming
data from categorical raster maps, as well as for computing per-cell derived
metrics. It is the primary processing layer of the convster package.
Key functionality includes:
Category handling: Selecting pixels by category value, listing available categories, and extracting per-category data arrays (
select_category(),get_categories(),get_category_data()).Filter application: Applying arbitrary image filters per category with optional output rescaling (
get_filtered_categories(),get_category_data()).Entropy computation: Computing per-cell Shannon entropy across category probability layers (
compute_entropy(),get_max_entropy(),get_entropy_view()).Interaction metrics: Computing per-cell multi-layer interaction values normalised by the theoretical maximum (
view_interaction()).Visualisation helpers: Generating entropy and interaction arrays ready for display or export (
view_entropy(),view_interaction()).
Functions#
|
Compute per-cell entropy over a series of data arrays. |
|
Compute per-cell interaction (inspired by the Simpson Index) across a series of data arrays. |
|
Return the sorted list of categories present in the data. |
|
Return the data of one or more categories, optionally after applying a filter. |
|
Returns the entropy for some categories over a view from a tif file |
|
Extract each category from a data array into separate arrays and optionally apply a filter. |
|
Maximum entropy value for a given number of categories. |
|
Filter an array for particular category or categories. |
|
Compute blurred binary arrays for each category in a categorical TIFF file. |
|
Compute the per-cell entropy for a set of category arrays within a specified view. |
|
Compute the per-cell interaction metric for a set of category arrays within a specified view. |
Module Contents#
- convster.processing.compute_entropy(data_arrays, normed=True, max_entropy_categories=None, as_dtype=None, output_range=None, **entropy_params)[source]#
Compute per-cell entropy over a series of data arrays.
The input arrays are stacked along a new axis, and entropy is calculated for each cell. The resulting array can be normalized, converted to a different dtype, and rescaled to a specified output range.
- Parameters:
data_arrays (Sequence[NDArray]) – Sequence of arrays to compute per-cell entropy over. All arrays must have the same shape.
normed (bool) – If True (default), entropy values are normalized according to the maximum possible entropy. If False, the raw entropy is returned without rescaling.
max_entropy_categories (int or None) – Maximum number of categories to use for normalization when normed=True. Ignored if normed=False.
as_dtype (type or str or None) – Data type for the output array. Useful to reduce memory usage when normed=True.
output_range (tuple or None) – Range to rescale normalized entropy values. Ignored if normed=False.
**entropy_params (dict) – Additional keyword arguments passed to
scipy.stats.entropy().
- Returns:
Array of the same shape as the input arrays, containing the per-cell entropy.
- Return type:
NDArray
Notes
When normed=True, the entropy is mapped to [0, 1] for float outputs by default, or to the full range of the specified integer type if as_dtype is integer.
Converting to a different dtype without normalization may produce unbounded results.
For large arrays, using a smaller as_dtype (e.g., ‘uint8’) can save memory.
Normalization uses
get_max_entropy()to determine the maximum entropy given the number of input arrays, andconvert_to_dtype()for rescaling.
See also
get_max_entropy()Compute the maximum entropy for a given number of categories.
_get_entropy()Internal wrapper combining blurring and entropy computation.
Examples
>>> data1 = np.array([[10, 5], ... [4, 1]]) >>> data2 = np.array([[1, 5], ... [2, 9]]) >>> compute_entropy([data1, data2], normed=True, as_dtype='float32') array([[0.439497 , 1. ], [0.91829586, 0.4689956 ]], dtype=float32)
- convster.processing.compute_interaction(data_arrays, input_dtype=None, standardize=False, normed=True, output_dtype=None, output_range=None)[source]#
Compute per-cell interaction (inspired by the Simpson Index) across a series of data arrays.
The interaction is calculated as the element-wise product of the input arrays. Optionally, the interaction can be standardized, normalized, and converted to a specified output data type.
- For float inputs:
- \[interaction = LC_1 \times LC_2 \times \dots \times LC_n\]
- For integer (e.g., uint8) inputs:
- \[interaction = \frac{\left(\frac{LC_1}{\text{max}} \times \frac{LC_2}{\text{max}} \times \dots \right)}{(1/n^n)} \times \text{max}\]
- Parameters:
data_arrays (Sequence[NDArray]) – Sequence of arrays to compute per-cell interaction over. All arrays must have the same shape.
input_dtype (type or str or None, default=None) – Expected data type of the input arrays. Raises an error if actual dtype does not match.
standardize (bool, default=False) – If True, interaction is standardized by the sum of the layers: \(interaction = \frac{A \cdot B \cdot ...}{A + B + ...}\).
normed (bool, default=True) – If True, interaction values are normalized to the theoretical maximum interaction: \((1/n)^n\) for n arrays.
output_dtype (type or str or None, default=None) – Data type for the output array. Values are rescaled appropriately for integer outputs.
output_range (tuple or None, default=None) – Target range for output values (currently used only for integer outputs; reserved for future use).
- Returns:
Array of the same shape as the input arrays, containing the per-cell interaction.
- Return type:
NDArray
Notes
Standardization (standardize=True) scales the interaction by the sum of input layers.
Conversion to integer types uses scaling and
numpy.ceil()to avoid rounding artifacts.normed=True ensures the maximum possible interaction corresponds to 1 (float) or the maximum of the integer.
Input/output range detection uses
dtype_range().
See also
compute_entropy()Compute per-cell entropy over a series of data arrays.
Examples
Example 1: float inputs, 2 arrays >>> a = np.array([[0.5, 0.25], [0.0, 0.05]]) >>> b = np.array([[0.5, 0.25], [0.0, 0.3]]) >>> compute_interaction([a, b], standardize=True, normed=True) array([[1. , 0.5 ], [0. , 0.17142857]]) >>> compute_interaction([a, b], standardize=True, normed=False) array([[0.25 , 0.125 ], [0. , 0.04285714]]) Example 2: integer inputs (uint8), 3 arrays, float output >>> a = np.array([[85, 100], [50, 60]], dtype=np.uint8) >>> b = np.array([[85, 50], [100, 80]], dtype=np.uint8) >>> c = np.array([[85, 105], [100, 10]], dtype=np.uint8) >>> compute_interaction([a, b, c], standardize=True, normed=True, output_dtype=np.float64) array([[1. , 0.85487482], [0.83044983, 0.13287197]])
- convster.processing.get_categories(data)[source]#
Return the sorted list of categories present in the data.
- Parameters:
data (NDArray) – Array of integers indicating the category of each pixel.
- Returns:
Sorted list of unique categories present in the data. Uses
numpy.unique()to determine unique values.- Return type:
See also
select_category()Filter an array for a specific category.
get_filtered_categories()Extract all categories into separate arrays.
Examples
>>> a = np.array([[0, 1, 2], ... [2, 1, 0], ... [1, 0, 2]]) >>> get_categories(a) [0, 1, 2]
- convster.processing.get_category_data(data, category, img_filter=None, filter_params=None, filter_output_range=None, as_dtype=None, output_range=None, data_as_dtype='uint8')[source]#
Return the data of one or more categories, optionally after applying a filter.
- Parameters:
data (NDArray) – Matrix indicating the per-cell category.
category (int or list[int]) – The category (or categories) to extract.
img_filter (Callable or None) – Filter function applied to the selected category data (e.g.,
skimage.filters.gaussian).filter_params (dict or None) – Parameters passed to
img_filter.filter_output_range (tuple or None) – Output value range to apply after filtering, if applicable.
as_dtype (type or str or None) – Desired data type of the output array.
output_range (tuple or None) –
Custom value range
(min, max)to which the output will be scaled. Useful when filters produce floating-point values.For example, a Gaussian filter returns a
float64array with values in[0, 1]. Withas_dtype="uint8", these values are mapped to[0, 255], reducing memory usage.data_as_dtype (type or str) – Data type of the array used to encode the category mask before filtering. Default is
"uint8". For datasets with more than 255 categories,"uint16"may be more appropriate.
- Returns:
Filtered or unfiltered array of the selected category, converted and
scaled according to the specified options.
- Return type:
numpy.typing.NDArray
Notes
If no image filter is provided, either
as_dtypeoroutput_rangemust be set to define the data type or range of the output array.If an image filter is provided,
as_dtypeconverts the data before the filter is applied.
See also
select_category()Create a binary indicator array for a category.
get_filtered_categories()Apply this function across all categories.
_filter_data()Apply a filter and rescale the resulting data.
- convster.processing.get_entropy_view(source, view, inner_view, categories, img_filter, filter_params=dict(), max_entropy_categories=None, blur_as_int=None, filter_output_range=None, blur_output_dtype=None, output_dtype=None, output_range=None, normed=True, **tags)[source]#
Returns the entropy for some categories over a view from a tif file
Warning
This function is deprecated and should not be used
- Parameters:
source (str) – Path to the TIFF file to load.
view (tuple[int, int, int, int]) – A 4-tuple defining the view of the data array to update: (start_row, end_row, start_col, end_col).
inner_view (tuple of int) – A 4-tuple defining the inner part of the view, excluding border effects.
categories (Collection, optional) – A collection of category values to extract. If None, all categories are processed.
img_filter (Callable) – A function that will be applied to each category indicator array.
filter_params (dict, optional) – Keyword arguments to pass to img_filter. Default is an empty dictionary.
filter_output_range (tuple, optional) – Output range for the filtered arrays. If None, no explicit rescaling is applied.
view – A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)).
max_entropy_categories (int or None) – If normed is true, this determines the maximum n for Entropy to be used to caluclate the maximum to norm by. This argument is ignored if normed=False.
output_dtype (type or str or None) – Data type for the returned entropy array. If None, the dtype is inferred.
output_range (tuple or None) –
The data-range to use for the returned array.
Note
This argument is only taken into account if normed=True.
normed (bool) – If True, normalize the entropy values to the range [0, 1] using the maximum possible entropy determined by max_entropy_categories.
**tags (dict) –
Arbitrary number of keyword arguments to describe the band to select.
See
load_block()for further details.blur_as_int (bool | None)
See also
view_blurred()Compute blurred binary arrays per category.
view_entropy()Compute per-cell entropy for a set of category arrays.
- convster.processing.get_filtered_categories(data, categories=None, img_filter=None, output_dtype='uint8', output_range=None, filter_output_range=None, filter_params=None)[source]#
Extract each category from a data array into separate arrays and optionally apply a filter.
- Parameters:
data (NDArray) – Array containing integer categories, e.g., a land-cover type matrix.
categories (Collection or None) – Collection of categories to extract. If None, all categories in data are extracted.
img_filter (Callable or None) – Callable to apply as a filter to each category array (e.g., skimage.filters.gaussian).
output_dtype (type or str or None) – Data type for the returned arrays (default: “uint8”).
output_range (tuple or None) – Range to rescale the filtered arrays.
filter_output_range (tuple or None) – Expected output range of the filter for proper scaling.
filter_params (dict or None) – Dictionary of parameters to pass to the filter function.
- Returns:
A dictionary mapping each category to its filtered and optionally rescaled array.
- Return type:
Notes
See
get_category_data()for details on extracting category-specific data.
See also
get_category_data()Extract and optionally filter data for one category.
get_categories()Infer the list of categories from an array.
- convster.processing.get_max_entropy(nbr)[source]#
Maximum entropy value for a given number of categories.
The maximum Shannon entropy occurs when the distribution is uniform across nbr categories, i.e. all categories have equal probability.
- Parameters:
nbr (int) – The number of categories.
- Returns:
The maximal entropy for a uniform distribution with nbr categories. Computed using
scipy.stats.entropy()with a uniform distribution.- Return type:
See also
compute_entropy()Compute per-cell entropy over a series of data arrays.
Examples
>>> get_max_entropy(2) 0.6931471805599453 >>> get_max_entropy(10) 2.302585092994046
- convster.processing.select_category(data, category, as_dtype='uint8', limits=None)[source]#
Filter an array for particular category or categories.
- Parameters:
data (NDArray) – Input matrix of integers indicating the category of each pixel.
category (int or list[int]) – The category (or list of categories) to select.
Data type of the output matrix.
Note
The output matrix will contain the maximal value possible for this data type in cells that match category, and the minimal value in all other cells.
limits (tuple or None) – Custom limits for output values. Must be a pair (is_value, is_not_value). If provided, these override the default min/max values inferred from as_dtype.
- Returns:
Matrix of type as_dtype with the same shape as data.
- Return type:
NDArray
See also
get_categories()Infer the list of categories from an array.
get_category_data()Extract and optionally filter data for one or more categories.
Examples
>>> data = np.array([ ... [0, 1, 2], ... [2, 1, 0], ... [1, 0, 2] ... ]) >>> select_category(data, category=1) array([[ 0, 255, 0], [ 0, 255, 0], [255, 0, 0]], dtype=uint8)
>>> select_category(data, category=[1, 2], as_dtype="int16", limits=(1000, -1000)) array([[-1000, 1000, 1000], [ 1000, 1000, -1000], [ 1000, -1000, 1000]], dtype=int16)
- convster.processing.view_blurred(source, view, inner_view, categories, img_filter, filter_params=dict(), filter_output_range=None, output_dtype='uint8', output_range=None, **tags)[source]#
Compute blurred binary arrays for each category in a categorical TIFF file.
The provided TIFF file must contain at least one band with categorical data (e.g., of type uint). For each specified category, an indicator array is created (dichotomous array marking presence/absence of that category), which is then filtered using the provided img_filter function.
Note
This method will be moved to the parallel sub-module in a future release.
- Parameters:
source (str) – Path to the TIFF file to load.
view (tuple of int) – A 4-tuple defining the view of the data array to update: (start_row, end_row, start_col, end_col).
inner_view (tuple of int) – A 4-tuple defining the inner part of the view, excluding border effects.
categories (Collection, optional) – A collection of category values to extract. If None, all categories are processed.
img_filter (Callable) – A function that will be applied to each category indicator array.
filter_params (dict, optional) – Keyword arguments to pass to img_filter. Default is an empty dictionary.
filter_output_range (tuple, optional) – Output range for the filtered arrays. If None, no explicit rescaling is applied.
output_dtype (type or str, optional) –
Data type for the returned arrays. Default is “uint8”.
Note
If provided, the output of the filter function will be rescaled to the range of this data type. See get_category_data for details.
output_range (tuple, optional) – Explicit output range for the filtered arrays. If not provided and the filter produces float-type data, the range [0, 1] is assumed, with values clipped to this range.
**tags (keyword arguments) – Arbitrary keyword arguments to specify which band to read from the TIFF file. See
load_block()for further details.
- Returns:
Dictionary with keys:
'data': a dictionary mapping each category to its blurred array'view': the inner_view defining the effective area of the returned arrays
- Return type:
See also
view_entropy()Compute per-cell entropy for a set of category arrays.
view_interaction()Compute per-cell interaction for a set of category arrays.
get_filtered_categories()Extract all categories with optional filtering.
- convster.processing.view_entropy(category_arrays, view, normed=True, max_entropy_categories=None, output_dtype=None, output_range=None)[source]#
Compute the per-cell entropy for a set of category arrays within a specified view.
- Parameters:
category_arrays (dict[int, NDArray]) – Dictionary mapping category indices to their corresponding arrays.
view (tuple[int, int, int, int]) – A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)).
normed (bool) – If True, normalize the entropy values to the range [0, 1] using the maximum possible entropy determined by max_entropy_categories.
max_entropy_categories (int or None) – The maximum number of categories used for normalization. Ignored if normed=False.
output_dtype (type or str or None) – Data type for the returned entropy array. If None, the dtype is inferred.
output_range (tuple or None) – Range to scale the output values to, e.g., (0, 1).
- Returns:
Dictionary with keys:
'data': NDArray of computed entropy values for the specified view.'view': tuple defining the original view of the data arrays.
- Return type:
See also
view_blurred()Compute blurred binary arrays per category.
view_interaction()Compute per-cell interaction for a set of category arrays.
compute_entropy()Underlying entropy computation function.
- convster.processing.view_interaction(category_arrays, view, input_dtype=np.uint8, standardize=False, normed=True, output_dtype=None, output_range=None)[source]#
Compute the per-cell interaction metric for a set of category arrays within a specified view.
The function returns a dictionary containing the computed interaction array and the original view. Interaction values can be standardized, normalized, and returned in a specific data type or range.
- Parameters:
category_arrays (dict[int, NDArray]) – Dictionary mapping category indices to their corresponding arrays.
view (tuple[int, int, int, int]) – A tuple defining the subregion of the arrays to process (e.g., (x_start, x_end, y_start, y_end)).
input_dtype (type or str or None) – Data type for input conversion before computing interactions.
standardize (bool) – If True, standardize the input arrays before computing interaction.
normed (bool) – If True, normalize the computed interaction values.
output_dtype (type or str or None) – Data type for the returned interaction array. If None, the dtype is inferred.
output_range (tuple or None) – Range to scale the output values to, e.g., (0, 1).
- Returns:
Dictionary with keys:
'data': NDArray of computed interaction values for the specified view.'view': tuple defining the original view of the data arrays.
- Return type:
See also
view_blurred()Compute blurred binary arrays per category.
view_entropy()Compute per-cell entropy for a set of category arrays.
compute_interaction()Underlying interaction computation function.