coonfit.inference ================= .. py:module:: coonfit.inference .. autoapi-nested-parse:: This module contains functions to facilitate carrying out various inference methods. In particular, it implements a multiple linear regression approach that allows to use categorical and any other type of maps as predictors for some response variable that is also provided as a map. An exemplary use case is the usage of land-cover types and various derivatives thereof as predictors for NDVI or land surface temperature. Functions --------- .. autoapisummary:: coonfit.inference.get_approx_weights coonfit.inference.get_optimal_weights coonfit.inference.get_optimal_weights_source coonfit.inference.init_X coonfit.inference.partial_X coonfit.inference.partial_response coonfit.inference.populate_X coonfit.inference.prepare_predictors coonfit.inference.prepare_selector coonfit.inference.transposed_product Module Contents --------------- .. py:function:: get_approx_weights(X, y, fit_intercept = False) Estimate regression weights using numerical optimization. This function fits a multiple linear regression model of the form .. math:: y = X\beta + \epsilon where :math:`X` is the predictor matrix, :math:`y` the response vector, :math:`\beta` the vector of regression weights, and :math:`\epsilon` a random error term. The weights are estimated using scikit-learn’s :class:`sklearn.linear_model.LinearRegression` estimator, which computes a least-squares solution using numerical linear algebra routines. :param X: Predictor matrix where each row corresponds to an observation and each column to a predictor variable. :type X: NDArray of shape (n_samples, n_features) :param y: Response vector. :type y: NDArray of shape (n_samples,) :param fit_intercept: Whether to fit an intercept term. .. rubric:: Notes If ``X`` was constructed using :func:`prepare_predictors` with ``include_intercept=True``, then ``fit_intercept`` should be set to ``False`` to avoid fitting the intercept twice. :type fit_intercept: bool, default=False :returns: **regression** -- Fitted linear regression model. :rtype: :class:`sklearn.linear_model.LinearRegression` .. seealso:: :func:`get_optimal_weights` Analytic least-squares solution (normal equations). :func:`prepare_predictors` Build X and y from raster bands. .. py:function:: get_optimal_weights(X, y) Compute the optimal regression weights for a multiple linear regression. The multiple linear regression model is defined as .. math:: y = X\beta + \epsilon where :math:`X` is the predictor matrix, :math:`y` the response vector, :math:`\beta` the vector of regression weights, and :math:`\epsilon` a random error term. The optimal least-squares solution for :math:`\beta` is given by .. math:: \beta = (X^T X)^{-1} X^T y which is computed directly using NumPy linear algebra routines. .. rubric:: Notes * The matrix :math:`X^T X` may be singular or ill-conditioned, in which case the inverse does not exist or the solution may be numerically unstable. * In such cases, alternative approaches such as singular value decomposition (SVD) or :func:`numpy.linalg.lstsq` are recommended. :param X: Predictor matrix where each row corresponds to an observation and each column to a predictor variable. :type X: NDArray of shape (n_samples, n_features) :param y: Response vector. :type y: NDArray of shape (n_samples,) :returns: **beta** -- Optimal regression weights minimizing the least-squares error. :rtype: NDArray of shape (n_features,) .. seealso:: :func:`get_approx_weights` Estimate weights via scikit-learn's LinearRegression. :func:`transposed_product` Compute X.T @ X from spatial band data. :func:`prepare_predictors` Build X and y from raster bands. .. py:function:: get_optimal_weights_source(Y, response, predictors, view, selector, include_intercept = False, as_dtype='float64') Compute optimal regression weights directly from predictors and a precomputed inverse transposed product. The multiple linear regression model is defined as .. math:: y = X\beta + \epsilon The least-squares solution for the regression weights is .. math:: \hat{\beta} = (X^T X)^{-1} X^T y Defining .. math:: Y = (X^T X)^{-1} this function computes .. math:: \hat{\beta} = Y X^T y directly from the predictor data and the response values, without explicitly recomputing :math:`X^T X`. :param Y: Inverse of the transposed product of the predictor matrix, :math:`(X^T X)^{-1}`. .. rubric:: Notes Typically, ``Y`` is obtained by inverting the output of :func:`transposed_product` via :func:`numpy.linalg.inv`. :type Y: NDArray of shape (n_features, n_features) :param response: Response variable specified either as a ``Band`` object or as the path to a raster file (``.tif``). If a string is provided, the raster must contain a single band. :type response: str or Band :param predictors: Collection of ``Band`` objects defining the predictor variables. :type predictors: Collection[Band] :param view: Spatial subset specified as ``(x, y, width, height)``. If ``None``, the full spatial extent is used. :type view: tuple of int or None :param selector: Boolean array indicating which pixels are valid and should be included in the regression. :type selector: NDArray of bool :param include_intercept: If ``True``, include an intercept term in the regression. The intercept weight is returned under the key ``'intercept'``. :type include_intercept: bool, default=False :param as_dtype: Data type used for constructing the predictor matrix. :type as_dtype: str or type, default="float64" :returns: **weights** -- Dictionary mapping each predictor to its fitted regression weight. If ``include_intercept`` is ``True``, an additional key ``'intercept'`` is included. :rtype: dict .. seealso:: :func:`transposed_product` Compute X.T @ X, whose inverse is ``Y``. :func:`get_optimal_weights` Direct computation from X and y. :func:`partial_X` Generate the partial predictor matrix used here. .. py:function:: init_X(predictors, selector, window, include_intercept, as_dtype) Initialize the predictor matrix :math:`X` with appropriate dimensions. Creates an empty predictor matrix with rows corresponding to usable pixels (as determined by the selector) and columns for each predictor band, plus an optional intercept column. :param predictors: Collection of Band objects, each specifying one or more predictor variables. The number of predictors determines the number of columns in the output matrix (excluding the intercept). :type predictors: Collection of Band :param selector: Boolean array to select usable cells. Only pixels where selector is True will be included in the matrix rows. :type selector: NDArray of bool :param window: Limits the data array to a specific spatial window. If provided, the window is converted to slices using :meth:`rasterio.windows.Window.toslices`. If None, the entire selector array is used. :type window: Window or None :param include_intercept: If True, adds an extra column of ones at the end of the matrix for fitting intercept terms in regression models. :type include_intercept: bool :param as_dtype: Data type for the output array. Can be a numpy dtype or string specification (e.g., 'float64', np.float32). :type as_dtype: type or str :returns: Zero-initialized array of shape (n_rows, n_cols) where: - n_rows is the count of usable pixels in the (windowed) selector - n_cols is ``len(predictors) + 1`` if include_intercept else ``len(predictors)`` :rtype: NDArray .. rubric:: Notes The ``window`` parameter is intended for parallelized processing where different processes handle different spatial subsets of the data. If the window slicing results in an IndexError (e.g., window is completely outside the selector bounds), the function returns an array with 0 rows. .. seealso:: :func:`populate_X` Fill the initialized predictor matrix with data. :func:`partial_X` Initialize and populate a predictor matrix in one step. .. py:function:: partial_X(predictors, window, selector, include_intercept, as_dtype) Generate a (partial) predictor matrix :math:`X`. This function constructs the predictor matrix from a collection of raster bands, optionally restricted to a spatial window. A boolean selector is applied to include only valid pixels. The function is intended to support partial or chunked processing of predictors, for example in parallelized workflows. :param predictors: Collection of ``Band`` objects defining the predictor variables. Individual bands may represent continuous or categorical predictors. :type predictors: Collection[Band] :param window: Spatial window defining the subset of predictor data to read. If ``None``, the full spatial extent is used. .. rubric:: Notes This argument is primarily intended to enable partial processing of predictors, e.g. in parallel or tiled computations. :type window: rasterio.windows.Window or None :param selector: Boolean array used to select valid pixels from the predictor data. :type selector: NDArray of bool :param include_intercept: If ``True``, an additional column of ones is appended to the predictor matrix to model an intercept term. :type include_intercept: bool :param as_dtype: Data type used for the resulting predictor matrix. This is also the type used to represent categorical predictors, if present. :type as_dtype: str or type :returns: **X** -- Predictor matrix containing the selected predictor values. The number of features corresponds to the number of predictors, plus one if ``include_intercept`` is ``True``. :rtype: NDArray of shape (n_samples, n_features) .. seealso:: :func:`init_X` Allocate the empty predictor matrix. :func:`populate_X` Fill the predictor matrix with data. :func:`transposed_product` Compute X.T @ X using this function internally. .. py:function:: partial_response(response, window, selector) Extract and return the response values within a window after applying a selector. This function reads the response raster data, optionally restricts it to a spatial window, and applies a boolean selector to return only the valid response values. The resulting array is suitable for use as the response vector in regression analyses. :param response: Response variable specified either as a ``Band`` object or as the path to a raster file (``.tif``). If a string is provided, the raster must contain a single band. :type response: str or Band :param window: Spatial window defining the subset of the response raster to read. If ``None``, the full raster extent is used. :type window: rasterio.windows.Window or None :param selector: Boolean array used to select valid pixels. If ``window`` is provided, the selector is sliced accordingly before being applied. :type selector: NDArray of bool :returns: **y** -- One-dimensional array containing the selected response values. :rtype: NDArray of shape (n_samples,) .. seealso:: :func:`partial_X` Generate the corresponding predictor matrix. :func:`prepare_predictors` Build X and y together from raster bands. .. py:function:: populate_X(X, predictors, as_dtype, window, selector, include_intercept) Populate predictor matrix :math:`X` with data from predictor bands. Reads data from each predictor band, applies the selector mask within the specified window, and fills the columns of matrix :math:`X`. Optionally adds an intercept column of ones. :param X: Pre-initialized array to be populated with predictor data. Modified in-place. Should have shape (n_usable_pixels, n_predictors) or (n_usable_pixels, n_predictors + 1) if include_intercept is True. :type X: NDArray :param predictors: Collection of Band objects, each specifying one predictor variable. Data from each band will be read and placed in the corresponding column of :math:`X`. :type predictors: Collection of Band :param as_dtype: Target data type for predictor values. Converted using :func:`~riogrande.helper.convert_to_dtype` without rescaling (``in_range`` and ``out_range`` are ``None``). :type as_dtype: type or str :param window: Limits data reading to a specific spatial window. If provided, the window is converted to slices using :meth:`rasterio.windows.Window.toslices`. If None, the entire data array is used. :type window: Window or None :param selector: Boolean array to select usable pixels. Applied after windowing to extract only valid data points for the predictor matrix. :type selector: NDArray of bool :param include_intercept: If True, the last column of X is filled with ones to represent the intercept term in regression models. :type include_intercept: bool :returns: X is modified in-place. :rtype: None .. seealso:: :func:`init_X` Allocate the empty predictor matrix. :func:`partial_X` Initialize and populate in one step. .. py:function:: prepare_predictors(response, *predictors, view = None, include_intercept=True, verbose = False) Generate the predictor matrix and response vector for multiple linear regression. This function constructs the predictor matrix :math:`X` and the response vector :math:`y` used in a multiple linear regression model of the form .. math:: y = X\beta + \epsilon where :math:`y` represents the response data and :math:`X` the predictor matrix. The response data are used as a reference to determine valid pixels. A mask is extracted from the response (e.g., nodata values or an 8-bit mask) and applied to all predictors. Predictor-specific missing values are also added to the mask, ensuring that only pixels with complete information across all variables are included in the regression. This masking procedure reduces the number of pixels used in the analysis, yielding a smaller but denser predictor matrix. .. rubric:: Notes An :exc:`~coonfit.exceptions.InferenceError` is raised if an invalid predictor is detected. A predictor is considered invalid in the following cases: * After masking, the predictor contains only zeros. * The predictor represents categorical data where all selected pixels belong to a category and ``include_intercept`` is ``True``. No general test for linear dependence between predictors is performed. :param response: Response variable. Either a ``Band`` object or a string specifying the path to a raster file (``.tif``) containing the response data. If a string is provided, the raster must contain exactly one band. :type response: str or Band :param \*predictors: One or more predictors specified as ``Band`` objects or file paths. If a string is provided, it is interpreted as the path to a raster file and **all bands** in that file are added as individual predictors. Predictor data are cast to the same data type as the response. No rescaling is performed. :type \*predictors: Band or str :param view: Spatial subset of the data specified as ``(x, y, width, height)``. If not provided, the entire response raster is used. :type view: tuple of int, optional :param include_intercept: If ``True``, an additional column of ones is appended to the predictor matrix to model an intercept term. :type include_intercept: bool, default=True :param verbose: If ``True``, print processing information. :type verbose: bool, default=False :returns: * **X** (*NDArray of shape (n_samples, n_features)*) -- Predictor matrix. The number of features corresponds to the total number of predictors, plus one if ``include_intercept`` is ``True``. * **y** (*NDArray of shape (n_samples,)*) -- Response vector containing the response values corresponding to the selected pixels. .. seealso:: :func:`prepare_selector` Build the boolean selector used internally. :func:`init_X` Allocate the predictor matrix. :func:`transposed_product` Compute X.T @ X for large spatial datasets. .. py:function:: prepare_selector(response, *predictors, extra_masking_band = None, verbose=False) Create a boolean selector based on the masks of response and predictors. The selector is a boolean array indicating which pixels can be used (True) and which cannot (False) based on the combined masks of all input bands. :param response: A Band object or path string describing the response data. If a string is provided, it will be converted to a Band object with bidx=1. :type response: str or Band :param \*predictors: Variable number of Band objects, each specifying one or more predictor variables. Their masks will be combined with the response mask. :type \*predictors: Band :param extra_masking_band: Additional Band object treated as a rasterio mask, where values equal to 0 will be masked out. Default is None. :type extra_masking_band: Band, optional :param verbose: If True, prints runtime information about usable pixels at each masking stage. Default is False. :type verbose: bool, optional :returns: Boolean array of the same shape as the response band, where True indicates usable pixels and False indicates masked pixels. :rtype: NDArray of bool .. rubric:: Notes The selector combines masks using logical AND operations, meaning a pixel is only usable (True) if it is valid across all input bands. The mask reader can be configured using :meth:`~riogrande.io.models.Band.set_mask_reader` with ``use='band'`` or ``use='source'``. .. seealso:: :func:`_enrich_selector` Refine a selector using predictor masks. :func:`init_X` Initialize the predictor matrix using the selector. :func:`prepare_predictors` High-level function that calls this internally. .. rubric:: Examples >>> selector = prepare_selector(response_band, predictor1, predictor2, verbose=True) >>> valid_data = response_data[selector] .. py:function:: transposed_product(predictors, view, selector, include_intercept = False, as_dtype = 'float64') Compute the transposed product :math:`X^T X` for a set of predictors. This function extracts predictor values within a specified spatial view, applies a boolean selector to filter valid pixels, constructs the predictor matrix :math:`X`, and returns its transposed product. The result is commonly used in linear regression for computing normal equations. :param predictors: Collection of ``Band`` objects defining the predictor variables. :type predictors: Collection[Band] :param view: Spatial subset specified as ``(x, y, width, height)``. If ``None``, the full spatial extent is used. :type view: tuple of int or None :param selector: Boolean array indicating which pixels are valid and should be included in the computation. :type selector: NDArray of bool :param include_intercept: If ``True``, include an additional column of ones in the predictor matrix to model an intercept term. :type include_intercept: bool, default=False :param as_dtype: Data type of the resulting array. :type as_dtype: str or type, default="float64" :returns: **transprodX** -- The transposed product of the predictor matrix, ``X.T @ X``. The number of features corresponds to the number of predictors, plus one if ``include_intercept`` is ``True``. :rtype: NDArray of shape (n_features, n_features) .. seealso:: :func:`get_optimal_weights` Compute regression weights from X and y. :func:`get_optimal_weights_source` Compute weights using precomputed inverse. :func:`partial_X` Generate the predictor matrix ``X``.