PiecewiseDirectStandardization#

class chemotools.adaptation.PiecewiseDirectStandardization(window_length: int = 25, n_components: int = 2, scale: bool = True, storage: str = 'dense')[源代码]

基类:DocLinkMixin, OneToOneFeatureMixin, TransformerMixin, BaseEstimator

Piecewise Direct Standardization (PDS) is a transformer used for domain adaptation (calibration) applications. The transformer uses least squares to find a linear map from the target instrument space to the source instrument space, following the implementation by [1] and [2].

参数:
  • window_length (int, default=25) -- Half-width (w) of the local spectral window used in PDS

  • n_components (int, default=2) -- Number of components to keep for PLS model

  • scale (bool, default = True) -- Whether to scale X and Y in the PLS model

  • storage (str {"dense", "band"}, default="dense") --

    Storage format for the regression coefficients. - "dense" stores the full (n_features, n_features) matrix.

    Fastest when n_features is small enough that the matrix fits in CPU cache (roughly n_features ≤ 1 500 at float64).

    • "band" stores coefficients in a compact (n_features, 2 * window_length + 1) array and exploits the band structure in transform() via a strided sliding-window view and einsum. Fastest when n_features is large (roughly n_features > 1 500), and uses ~2–5 % of the memory of "dense".

变量:
  • n_features_in (int) -- Number of features seen during fit (set automatically by sklearn).

  • T (np.ndarray of shape (n_features, n_features), or None.) -- Banded transformation matrix. Dense ndarray when storage="dense". None when storage="band" or when fitted without X_source.

  • coef_band (np.ndarray of shape (n_features, 2 * window_length + 1), or None.) -- Packed band of per-feature PLS coefficients. Set only when storage="band"; None otherwise.

  • interior_start (int) -- Index of the first feature whose local window is fully interior (window_length). Set only when storage="band".

  • interior_end (int) -- One past the last interior feature (n_features - window_length). Set only when storage="band".

  • bias (np.ndarray of shape (n_features,), or None) -- Precomputed per-feature bias that absorbs local PLS centering, allowing transform() to avoid per-sample intermediate allocations. None if fitted with X_source=None.

  • x_source_provided (bool) -- Boolean flag indicating if X_source was provided during fitting.

抛出:
  • ValueError -- If X and X_source do not have the same shape.

  • ValueError -- If n_components exceeds n_samples.

  • ValueError -- If n_components exceeds the minimum window size at the boundaries (window_length + 1).

参见

DirectStandardization

Global linear transformation without local windows.

引用

示例

>>> import numpy as np
>>> from chemotools.adaptation import PiecewiseDirectStandardization
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(50, 100))
>>> X_source = X * 1.2 + rng.normal(0, 0.1, size=(50, 100))
>>> pds = PiecewiseDirectStandardization(window_length=5, n_components=2)
>>> pds.fit(X, X_source=X_source)
PiecewiseDirectStandardization(n_components=2, window_length=5)
>>> X_transformed = pds.transform(X)
>>> X_transformed.shape
(50, 100)

Attributes

n_features_in_

T_

coef_band_

interior_start_

interior_end_

bias_

x_source_provided_

n_features_in_: int
T_: ndarray | None
coef_band_: ndarray | None
interior_start_: int
interior_end_: int
bias_: ndarray | None
x_source_provided_: bool
fit(X: ndarray, y=None, *, X_source: ndarray | None = None) PiecewiseDirectStandardization[源代码]

Fit the PiecewiseDirectStandardization to the input data.

参数:
  • X (np.ndarray of shape (n_samples, n_features)) -- Data from the target instrument.

  • y (None) -- Ignored to align with API.

  • X_source (np.ndarray of shape (n_samples, n_features), optional) -- Data from the source instrument. If None, the transformer defaults to an identity transformation.

返回:

self

返回类型:

PiecewiseDirectStandardization

transform(X) ndarray[源代码]

Use the trained model to transform the target data

参数:

X (np.ndarray of shape (n_samples, n_features)) -- Input data to transform

返回:

X_transformed -- Data transformed

返回类型:

np.ndarray of shape (n_samples, n_features)

set_fit_request(*, X_source: bool | None | str = '$UNCHANGED$') PiecewiseDirectStandardization

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

在 1.3 版本加入.

参数:

X_source (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for X_source parameter in fit.

返回:

self -- The updated object.

返回类型:

object