coonfit.inference#
This module contains functions to facilitate carrying out various inference methods.
In particular, it implements a multiple linear regression approach that allows to use categorical and any other type of maps as predictors for some response variable that is also provided as a map.
An exemplary use case is the usage of land-cover types and various derivatives thereof as predictors for NDVI or land surface temperature.
Functions#
|
Estimate regression weights using numerical optimization. |
|
Compute the optimal regression weights for a multiple linear regression. |
|
Compute optimal regression weights directly from predictors and a precomputed |
|
Initialize the predictor matrix \(X\) with appropriate dimensions. |
|
Generate a (partial) predictor matrix \(X\). |
|
Extract and return the response values within a window after applying a selector. |
|
Populate predictor matrix \(X\) with data from predictor bands. |
|
Generate the predictor matrix and response vector for multiple linear regression. |
|
Create a boolean selector based on the masks of response and predictors. |
|
Compute the transposed product \(X^T X\) for a set of predictors. |
Module Contents#
- coonfit.inference.get_approx_weights(X, y, fit_intercept=False)[source]#
Estimate regression weights using numerical optimization.
This function fits a multiple linear regression model of the form
\[y = X\beta + \epsilon\]where \(X\) is the predictor matrix, \(y\) the response vector, \(\beta\) the vector of regression weights, and \(\epsilon\) a random error term.
The weights are estimated using scikit-learn’s
sklearn.linear_model.LinearRegressionestimator, which computes a least-squares solution using numerical linear algebra routines.- Parameters:
X (NDArray of shape (n_samples, n_features)) – Predictor matrix where each row corresponds to an observation and each column to a predictor variable.
y (NDArray of shape (n_samples,)) – Response vector.
fit_intercept (bool, default=False) –
Whether to fit an intercept term.
Notes
If
Xwas constructed usingprepare_predictors()withinclude_intercept=True, thenfit_interceptshould be set toFalseto avoid fitting the intercept twice.
- Returns:
regression – Fitted linear regression model.
- Return type:
See also
get_optimal_weights()Analytic least-squares solution (normal equations).
prepare_predictors()Build X and y from raster bands.
- coonfit.inference.get_optimal_weights(X, y)[source]#
Compute the optimal regression weights for a multiple linear regression.
The multiple linear regression model is defined as
\[y = X\beta + \epsilon\]where \(X\) is the predictor matrix, \(y\) the response vector, \(\beta\) the vector of regression weights, and \(\epsilon\) a random error term.
The optimal least-squares solution for \(\beta\) is given by
\[\beta = (X^T X)^{-1} X^T y\]which is computed directly using NumPy linear algebra routines.
Notes
The matrix \(X^T X\) may be singular or ill-conditioned, in which case the inverse does not exist or the solution may be numerically unstable.
In such cases, alternative approaches such as singular value decomposition (SVD) or
numpy.linalg.lstsq()are recommended.
- Parameters:
X (NDArray of shape (n_samples, n_features)) – Predictor matrix where each row corresponds to an observation and each column to a predictor variable.
y (NDArray of shape (n_samples,)) – Response vector.
- Returns:
beta – Optimal regression weights minimizing the least-squares error.
- Return type:
NDArray of shape (n_features,)
See also
get_approx_weights()Estimate weights via scikit-learn’s LinearRegression.
transposed_product()Compute X.T @ X from spatial band data.
prepare_predictors()Build X and y from raster bands.
- coonfit.inference.get_optimal_weights_source(Y, response, predictors, view, selector, include_intercept=False, as_dtype='float64')[source]#
Compute optimal regression weights directly from predictors and a precomputed inverse transposed product.
The multiple linear regression model is defined as
\[y = X\beta + \epsilon\]The least-squares solution for the regression weights is
\[\hat{\beta} = (X^T X)^{-1} X^T y\]Defining
\[Y = (X^T X)^{-1}\]this function computes
\[\hat{\beta} = Y X^T y\]directly from the predictor data and the response values, without explicitly recomputing \(X^T X\).
- Parameters:
Y (NDArray of shape (n_features, n_features)) –
Inverse of the transposed product of the predictor matrix, \((X^T X)^{-1}\).
Notes
Typically,
Yis obtained by inverting the output oftransposed_product()vianumpy.linalg.inv().response (str or Band) – Response variable specified either as a
Bandobject or as the path to a raster file (.tif). If a string is provided, the raster must contain a single band.predictors (Collection[Band]) – Collection of
Bandobjects defining the predictor variables.view (tuple of int or None) – Spatial subset specified as
(x, y, width, height). IfNone, the full spatial extent is used.selector (NDArray of bool) – Boolean array indicating which pixels are valid and should be included in the regression.
include_intercept (bool, default=False) – If
True, include an intercept term in the regression. The intercept weight is returned under the key'intercept'.as_dtype (str or type, default="float64") – Data type used for constructing the predictor matrix.
- Returns:
weights – Dictionary mapping each predictor to its fitted regression weight. If
include_interceptisTrue, an additional key'intercept'is included.- Return type:
See also
transposed_product()Compute X.T @ X, whose inverse is
Y.get_optimal_weights()Direct computation from X and y.
partial_X()Generate the partial predictor matrix used here.
- coonfit.inference.init_X(predictors, selector, window, include_intercept, as_dtype)[source]#
Initialize the predictor matrix \(X\) with appropriate dimensions.
Creates an empty predictor matrix with rows corresponding to usable pixels (as determined by the selector) and columns for each predictor band, plus an optional intercept column.
- Parameters:
predictors (Collection of Band) – Collection of Band objects, each specifying one or more predictor variables. The number of predictors determines the number of columns in the output matrix (excluding the intercept).
selector (NDArray of bool) – Boolean array to select usable cells. Only pixels where selector is True will be included in the matrix rows.
window (Window or None) – Limits the data array to a specific spatial window. If provided, the window is converted to slices using
rasterio.windows.Window.toslices(). If None, the entire selector array is used.include_intercept (bool) – If True, adds an extra column of ones at the end of the matrix for fitting intercept terms in regression models.
as_dtype (type or str) – Data type for the output array. Can be a numpy dtype or string specification (e.g., ‘float64’, np.float32).
- Returns:
Zero-initialized array of shape (n_rows, n_cols) where:
n_rows is the count of usable pixels in the (windowed) selector
n_cols is
len(predictors) + 1if include_intercept elselen(predictors)
- Return type:
NDArray
Notes
The
windowparameter is intended for parallelized processing where different processes handle different spatial subsets of the data. If the window slicing results in an IndexError (e.g., window is completely outside the selector bounds), the function returns an array with 0 rows.See also
populate_X()Fill the initialized predictor matrix with data.
partial_X()Initialize and populate a predictor matrix in one step.
- coonfit.inference.partial_X(predictors, window, selector, include_intercept, as_dtype)[source]#
Generate a (partial) predictor matrix \(X\).
This function constructs the predictor matrix from a collection of raster bands, optionally restricted to a spatial window. A boolean selector is applied to include only valid pixels. The function is intended to support partial or chunked processing of predictors, for example in parallelized workflows.
- Parameters:
predictors (Collection[Band]) – Collection of
Bandobjects defining the predictor variables. Individual bands may represent continuous or categorical predictors.window (rasterio.windows.Window or None) –
Spatial window defining the subset of predictor data to read. If
None, the full spatial extent is used.Notes
This argument is primarily intended to enable partial processing of predictors, e.g. in parallel or tiled computations.
selector (NDArray of bool) – Boolean array used to select valid pixels from the predictor data.
include_intercept (bool) – If
True, an additional column of ones is appended to the predictor matrix to model an intercept term.as_dtype (str or type) – Data type used for the resulting predictor matrix. This is also the type used to represent categorical predictors, if present.
- Returns:
X – Predictor matrix containing the selected predictor values. The number of features corresponds to the number of predictors, plus one if
include_interceptisTrue.- Return type:
NDArray of shape (n_samples, n_features)
See also
init_X()Allocate the empty predictor matrix.
populate_X()Fill the predictor matrix with data.
transposed_product()Compute X.T @ X using this function internally.
- coonfit.inference.partial_response(response, window, selector)[source]#
Extract and return the response values within a window after applying a selector.
This function reads the response raster data, optionally restricts it to a spatial window, and applies a boolean selector to return only the valid response values. The resulting array is suitable for use as the response vector in regression analyses.
- Parameters:
response (str or Band) – Response variable specified either as a
Bandobject or as the path to a raster file (.tif). If a string is provided, the raster must contain a single band.window (rasterio.windows.Window or None) – Spatial window defining the subset of the response raster to read. If
None, the full raster extent is used.selector (NDArray of bool) – Boolean array used to select valid pixels. If
windowis provided, the selector is sliced accordingly before being applied.
- Returns:
y – One-dimensional array containing the selected response values.
- Return type:
NDArray of shape (n_samples,)
See also
partial_X()Generate the corresponding predictor matrix.
prepare_predictors()Build X and y together from raster bands.
- coonfit.inference.populate_X(X, predictors, as_dtype, window, selector, include_intercept)[source]#
Populate predictor matrix \(X\) with data from predictor bands.
Reads data from each predictor band, applies the selector mask within the specified window, and fills the columns of matrix \(X\). Optionally adds an intercept column of ones.
- Parameters:
X (NDArray) – Pre-initialized array to be populated with predictor data. Modified in-place. Should have shape (n_usable_pixels, n_predictors) or (n_usable_pixels, n_predictors + 1) if include_intercept is True.
predictors (Collection of Band) – Collection of Band objects, each specifying one predictor variable. Data from each band will be read and placed in the corresponding column of \(X\).
as_dtype (type or str) – Target data type for predictor values. Converted using
convert_to_dtype()without rescaling (in_rangeandout_rangeareNone).window (Window or None) – Limits data reading to a specific spatial window. If provided, the window is converted to slices using
rasterio.windows.Window.toslices(). If None, the entire data array is used.selector (NDArray of bool) – Boolean array to select usable pixels. Applied after windowing to extract only valid data points for the predictor matrix.
include_intercept (bool) – If True, the last column of X is filled with ones to represent the intercept term in regression models.
- Returns:
X is modified in-place.
- Return type:
None
See also
init_X()Allocate the empty predictor matrix.
partial_X()Initialize and populate in one step.
- coonfit.inference.prepare_predictors(response, *predictors, view=None, include_intercept=True, verbose=False)[source]#
Generate the predictor matrix and response vector for multiple linear regression.
This function constructs the predictor matrix \(X\) and the response vector \(y\) used in a multiple linear regression model of the form
\[y = X\beta + \epsilon\]where \(y\) represents the response data and \(X\) the predictor matrix.
The response data are used as a reference to determine valid pixels. A mask is extracted from the response (e.g., nodata values or an 8-bit mask) and applied to all predictors. Predictor-specific missing values are also added to the mask, ensuring that only pixels with complete information across all variables are included in the regression.
This masking procedure reduces the number of pixels used in the analysis, yielding a smaller but denser predictor matrix.
Notes
An
InferenceErroris raised if an invalid predictor is detected. A predictor is considered invalid in the following cases:After masking, the predictor contains only zeros.
The predictor represents categorical data where all selected pixels belong to a category and
include_interceptisTrue.
No general test for linear dependence between predictors is performed.
- Parameters:
Response variable. Either a
Bandobject or a string specifying the path to a raster file (.tif) containing the response data.If a string is provided, the raster must contain exactly one band.
One or more predictors specified as
Bandobjects or file paths.If a string is provided, it is interpreted as the path to a raster file and all bands in that file are added as individual predictors.
Predictor data are cast to the same data type as the response. No rescaling is performed.
view (tuple of int, optional) – Spatial subset of the data specified as
(x, y, width, height). If not provided, the entire response raster is used.include_intercept (bool, default=True) – If
True, an additional column of ones is appended to the predictor matrix to model an intercept term.verbose (bool, default=False) – If
True, print processing information.
- Returns:
X (NDArray of shape (n_samples, n_features)) – Predictor matrix. The number of features corresponds to the total number of predictors, plus one if
include_interceptisTrue.y (NDArray of shape (n_samples,)) – Response vector containing the response values corresponding to the selected pixels.
See also
prepare_selector()Build the boolean selector used internally.
init_X()Allocate the predictor matrix.
transposed_product()Compute X.T @ X for large spatial datasets.
- coonfit.inference.prepare_selector(response, *predictors, extra_masking_band=None, verbose=False)[source]#
Create a boolean selector based on the masks of response and predictors.
The selector is a boolean array indicating which pixels can be used (True) and which cannot (False) based on the combined masks of all input bands.
- Parameters:
response (str or Band) – A Band object or path string describing the response data. If a string is provided, it will be converted to a Band object with bidx=1.
*predictors (Band) – Variable number of Band objects, each specifying one or more predictor variables. Their masks will be combined with the response mask.
extra_masking_band (Band, optional) – Additional Band object treated as a rasterio mask, where values equal to 0 will be masked out. Default is None.
verbose (bool, optional) – If True, prints runtime information about usable pixels at each masking stage. Default is False.
- Returns:
Boolean array of the same shape as the response band, where True indicates usable pixels and False indicates masked pixels.
- Return type:
NDArray of bool
Notes
The selector combines masks using logical AND operations, meaning a pixel is only usable (True) if it is valid across all input bands.
The mask reader can be configured using
set_mask_reader()withuse='band'oruse='source'.See also
_enrich_selector()Refine a selector using predictor masks.
init_X()Initialize the predictor matrix using the selector.
prepare_predictors()High-level function that calls this internally.
Examples
>>> selector = prepare_selector(response_band, predictor1, predictor2, verbose=True) >>> valid_data = response_data[selector]
- coonfit.inference.transposed_product(predictors, view, selector, include_intercept=False, as_dtype='float64')[source]#
Compute the transposed product \(X^T X\) for a set of predictors.
This function extracts predictor values within a specified spatial view, applies a boolean selector to filter valid pixels, constructs the predictor matrix \(X\), and returns its transposed product. The result is commonly used in linear regression for computing normal equations.
- Parameters:
predictors (Collection[Band]) – Collection of
Bandobjects defining the predictor variables.view (tuple of int or None) – Spatial subset specified as
(x, y, width, height). IfNone, the full spatial extent is used.selector (NDArray of bool) – Boolean array indicating which pixels are valid and should be included in the computation.
include_intercept (bool, default=False) – If
True, include an additional column of ones in the predictor matrix to model an intercept term.as_dtype (str or type, default="float64") – Data type of the resulting array.
- Returns:
transprodX – The transposed product of the predictor matrix,
X.T @ X. The number of features corresponds to the number of predictors, plus one ifinclude_interceptisTrue.- Return type:
NDArray of shape (n_features, n_features)
See also
get_optimal_weights()Compute regression weights from X and y.
get_optimal_weights_source()Compute weights using precomputed inverse.
partial_X()Generate the predictor matrix
X.