convster.parallel#

This module contains various helper functions to parallelize the application of filters on a tif

Functions#

apply_filter(source, output_file, block_size[, bands, ...])

Apply a filter to one or more bands of a raster and export the result.

compute_entropy(source, output_file, block_size, ...)

Compute entropy-based heterogeneity from categorical raster bands.

compute_interaction(source, output_file, block_size, ...)

Compute interaction strength between categorical raster bands.

extract_categories(source, categories, output_file, ...)

Extract per-category maps from a raster, optionally apply a filter, and export to a file.

Module Contents#

convster.parallel.apply_filter(source, output_file, block_size, bands=None, data_in_range=None, data_as_dtype=np.uint8, data_output_range=None, replace_nan_with=None, img_filter=None, filter_params=None, filter_output_range=(0.0, 1.0), output_dtype=np.uint8, output_range=None, selector_band=None, verbose=False, output_params=None, **params)[source]#

Apply a filter to one or more bands of a raster and export the result.

This function processes the raster in blocks to allow for memory-efficient and parallelized computation. Each block is filtered and then combined into a single output raster. Optionally, categorical masking can be used with selector_band.

Parameters:
  • source (str or Source) – Path to the input raster file or a Source object.

  • output_file (str) – Path where the filtered raster will be written.

  • block_size (tuple of int) – Size of the processing blocks in pixels as (width, height).

  • bands (list of Band, optional) – Specific bands to process. If None, all bands in the raster are used.

  • data_in_range (array-like, optional) – Input range used for rescaling loaded data before filtering.

  • data_as_dtype (type, optional, default=np.uint8) – Data type to which input data is converted before filtering.

  • data_output_range (array-like, optional) – Output range for data rescaling after conversion to data_as_dtype.

  • replace_nan_with (int or float, optional) – Value used to replace NaNs in the input before filtering.

  • img_filter (callable, optional) – Filter function to apply to the data (e.g., skimage.filters.gaussian).

  • filter_params (dict, optional) – Parameters to pass to img_filter.

  • filter_output_range (collection, default=(0., 1.)) – Expected output range of the applied filter function.

  • output_dtype (type, optional, default=np.uint8) – Data type of the filtered output raster.

  • output_range (tuple, optional) – Value range to rescale the final output raster.

  • selector_band (Band, optional) – Categorical band used as a mask to apply the filter selectively across categories.

  • output_params (dict, optional) –

    Additional output configuration:

    • as_dtype : data type for filtered output (overrides output_dtype)

    • nodata : value for missing data (default: None)

    • bigtiff : bool, whether to create a BIGTIFF for >4GB files

      Note

      This may become standard at a later point

  • verbose (bool, default=False) – Print progress and debug information.

  • **params

    Additional optional arguments:

    • nbrcpu : int, number of CPU cores for multiprocessing.

    • start_method : str, multiprocessing start method.

    • compress : bool, whether to compress the final output with LZW.

Returns:

output_file – Path to the resulting filtered raster file.

Return type:

str

Notes

See also

extract_categories()

Extract per-category maps with optional filtering.

compute_entropy()

Compute spatial entropy of categorical bands.

convster.parallel.compute_entropy(source, output_file, block_size, blur_params, categories=None, output_dtype=None, output_range=None, normed=True, max_entropy_categories=None, verbose=False, **params)[source]#

Compute entropy-based heterogeneity from categorical raster bands.

This function orchestrates a parallel computation of entropy-based heterogeneity across multiple raster blocks, combining per-block results into a single output GeoTIFF. Each block is processed independently via multiprocessing workers that read subsets of the input raster, compute entropy over selected categorical bands, and write results to a queue for aggregation.

Parameters:
  • source (str or Source) – Path to the categorical raster file (e.g., land-cover map) or an initialized Source object providing access to the dataset.

  • output_file (str) – Path where the final entropy GeoTIFF will be written.

  • block_size (tuple of int) – Block dimensions (width, height) in pixels. Determines the size of the chunks processed by each worker.

  • blur_params (dict) – Dictionary of parameters for Gaussian blur or other preprocessing. Typically includes at least one of 'sigma' or 'diameter' in spatial units (e.g., meters). Used primarily for output file naming.

  • categories (list, optional) – List of category names or identifiers to include in entropy computation. If not provided, all available categories in the source raster are used.

  • output_dtype (str, type, or None, optional) – Desired data type for the output entropy raster. If not provided, defaults to numpy.uint8.

  • output_range (tuple of float, optional) – Minimum and maximum output values. If not set, the range is inferred from output_dtype — [0, 1] for floating-point types, or the valid range for integer types (e.g., [0, 255] for uint8).

  • normed (bool, default=True) – Whether to normalize the entropy values. When True, entropy is scaled relative to the maximum possible entropy determined by the number of categories.

  • max_entropy_categories (int, optional) – Specifies the number of categories to assume for maximum entropy normalization. Ignored if normed=False.

  • verbose (bool, default=False) – If True, prints progress updates and diagnostic information.

  • **params

    Additional optional parameters controlling multiprocessing and output:

    • nbrcpu : int Number of CPU cores to use. Defaults to the number of available cores minus one.

    • start_method : str Multiprocessing start method (e.g., "spawn" or "fork").

    • entropy_as_ubyte : bool, optional Deprecated. Use output_dtype="uint8" instead.

    • compress : bool, default=False If True, compresses the final GeoTIFF using LZW compression.

Returns:

output_file – Path to the resulting entropy raster file.

Return type:

str

Notes

  • Each processing block computes entropy over the given categories using the internal _block_entropy() function and sends results to a queue.

  • The function _combine_entropy_blocks() merges these intermediate results into a single raster file.

  • The operation can be parallelized across multiple CPUs to handle large raster datasets efficiently.

  • Block decomposition uses create_views().

  • Worker pool is created with get_or_set_context() and sized using get_nbr_workers().

  • Output filename is constructed with output_filename().

See also

compute_interaction()

Compute spatial interaction between categorical bands.

extract_categories()

Extract per-category maps with optional filtering.

_block_entropy()

Worker function computing entropy for a single block.

convster.parallel.compute_interaction(source, output_file, block_size, blur_params, categories=None, output_dtype=None, output_range=None, standardize=False, normed=True, verbose=False, **params)[source]#

Compute interaction strength between categorical raster bands.

This function computes pairwise, three-way, or higher-order interaction measures among categorical raster bands (e.g., land-cover classes). The computation is performed block-wise using multiprocessing, where each worker processes a subset of the raster and pushes intermediate results to a queue. The results are then combined into a single output raster representing spatial interaction strength.

Parameters:
  • source (str or Source) – Path to the categorical raster file (e.g., a land-cover map) or an initialized Source object providing access to the dataset.

  • output_file (str) – Path where the final interaction raster will be saved.

  • block_size (tuple of int) – Block dimensions (width, height) in pixels that define the size of the chunks processed by each multiprocessing worker.

  • blur_params (dict) – Dictionary containing parameters for Gaussian blur or other preprocessing. Used primarily for formatting the output filename. Should include either 'sigma' or 'diameter' in spatial units.

  • categories (list, optional) – List of categories (e.g., land-cover types) to include in the interaction computation. If not provided, all categories available in the source raster are used.

  • output_dtype (str, type, or None, optional) – Desired data type for the output raster. If not provided, defaults to numpy.uint8.

  • output_range (tuple of float, optional) – Minimum and maximum values for scaling the output. If not specified, inferred automatically from output_dtype—[0, 1] for floats, or the valid integer range for integer types (e.g., [0, 255] for uint8).

  • standardize (bool, default=False) – Whether to standardize the data prior to computing interactions. This can improve comparability across categories with different magnitudes or distributions.

  • normed (bool, default=True) – If True, normalizes the computed interaction strengths relative to their theoretical maximum values.

  • verbose (bool, default=False) – If True, prints detailed progress and diagnostic messages during processing.

  • **params

    Additional optional parameters controlling multiprocessing and output:

    • nbrcpu : int Number of CPU cores to use. Defaults to the number of available cores minus one.

    • start_method : str Multiprocessing start method (e.g., "spawn" or "fork").

    • interaction_as_ubyte : bool, optional Deprecated. Use output_dtype="uint8" instead.

    • compress : bool, default=False If True, compresses the final GeoTIFF using LZW compression.

Returns:

output_file – Path to the resulting interaction raster file.

Return type:

str

Notes

  • The computation proceeds in parallel by dividing the raster into independent processing blocks.

  • Each worker executes an internal _block_interaction() task and writes its results to a multiprocessing queue.

  • The _combine_interaction_blocks() function merges all block results into a single output file.

  • Designed for efficient large-scale raster analysis on multicore systems.

  • Block decomposition uses create_views().

  • Worker pool is created with get_or_set_context() and sized using get_nbr_workers().

  • Output filename is constructed with output_filename().

See also

compute_entropy()

Compute spatial entropy of categorical bands.

extract_categories()

Extract per-category maps with optional filtering.

_block_interaction()

Worker function computing interaction for a single block.

convster.parallel.extract_categories(source, categories, output_file, block_size, img_filter=None, filter_params=None, output_dtype=None, output_params=None, filter_output_range=None, verbose=False, **params)[source]#

Extract per-category maps from a raster, optionally apply a filter, and export to a file.

This function processes a categorical raster by separating specified categories into individual bands. Optionally, a filter function (e.g., Gaussian smoothing) can be applied to each category band. The processing is performed block-wise with multiprocessing, and the results are combined into a single output raster.

Parameters:
  • source (str or Source) – Path to the input .tif file or a Source object providing access to it.

  • categories (list) – List of categorical values to separate into individual bands.

  • output_file (str) – Path where the resulting raster will be written.

  • block_size (tuple of int) – Size of processing blocks in pixels as (width, height).

  • img_filter (callable, optional) – A function to apply to each category band (e.g., skimage.filters.gaussian). If None, no filtering is applied and filter_params is ignored.

  • filter_params (dict, optional) – Parameters to pass to img_filter. Required if img_filter is provided.

  • output_dtype (str or type, optional) – Data type of the output bands. Deprecated; use output_params['as_dtype'] instead.

  • output_params (dict, optional) –

    Dictionary of output settings:

    • as_dtype : data type for the filtered output (overrides output_dtype)

    • nodata : value used for missing data (default: None)

    • bigtiff : bool, whether to create a BIGTIFF for >4GB files

    • output_range : tuple, output value range for data conversion

  • filter_output_range (tuple, optional) – Expected value range of the filtered data. Recommended when img_filter is used.

  • verbose (bool, default=False) – Print processing information and progress.

  • **params

    Additional optional arguments:

    • nbrcpu : int, number of CPU cores to use (default: available cores minus one)

    • start_method : str, multiprocessing start method (e.g., ‘spawn’ or ‘fork’)

    • compress : bool, if True, compress the final output with LZW

Returns:

output_file – Path to the resulting raster file containing extracted and optionally filtered category bands.

Return type:

str

Notes

See also

compute_entropy()

Compute spatial entropy of categorical bands.

compute_interaction()

Compute spatial interaction between categorical bands.

apply_filter()

Apply a filter to one or more bands of a raster.