PiecewiseDirectStandardization#

class chemotools.adaptation.PiecewiseDirectStandardization(window_length: int = 25, n_components: int = 2, scale: bool = True, storage: str = 'dense')[源代码]

基类：DocLinkMixin, OneToOneFeatureMixin, TransformerMixin, BaseEstimator

Piecewise Direct Standardization (PDS) is a transformer used for domain adaptation (calibration) applications. The transformer uses least squares to find a linear map from the target instrument space to the source instrument space, following the implementation by [1] and [2].

参数:

window_length (int, default=25) -- Half-width (w) of the local spectral window used in PDS
n_components (int, default=2) -- Number of components to keep for PLS model
scale (bool, default = True) -- Whether to scale X and Y in the PLS model
storage (str {"dense", "band"}, default="dense") --
Storage format for the regression coefficients. - "dense" stores the full (n_features, n_features) matrix.

Fastest when n_features is small enough that the matrix fits in CPU cache (roughly n_features ≤ 1 500 at float64).
- "band" stores coefficients in a compact (n_features, 2 * window_length + 1) array and exploits the band structure in transform() via a strided sliding-window view and einsum. Fastest when n_features is large (roughly n_features > 1 500), and uses ~2–5 % of the memory of "dense".

变量:

n_features_in (int) -- Number of features seen during fit (set automatically by sklearn).
T (np.ndarray of shape (n_features, n_features), or None.) -- Banded transformation matrix. Dense ndarray when storage="dense". None when storage="band" or when fitted without X_source.
coef_band (np.ndarray of shape (n_features, 2 * window_length + 1), or None.) -- Packed band of per-feature PLS coefficients. Set only when storage="band"; None otherwise.
interior_start (int) -- Index of the first feature whose local window is fully interior (window_length). Set only when storage="band".
interior_end (int) -- One past the last interior feature (n_features - window_length). Set only when storage="band".
bias (np.ndarray of shape (n_features,), or None) -- Precomputed per-feature bias that absorbs local PLS centering, allowing transform() to avoid per-sample intermediate allocations. None if fitted with X_source=None.
x_source_provided (bool) -- Boolean flag indicating if X_source was provided during fitting.

抛出:

ValueError -- If X and X_source do not have the same shape.
ValueError -- If n_components exceeds n_samples.
ValueError -- If n_components exceeds the minimum window size at the boundaries (window_length + 1).

参见

DirectStandardization: Global linear transformation without local windows.

引用

示例

>>> import numpy as np
>>> from chemotools.adaptation import PiecewiseDirectStandardization
>>> rng = np.random.default_rng(42)
>>> X = rng.normal(size=(50, 100))
>>> X_source = X * 1.2 + rng.normal(0, 0.1, size=(50, 100))
>>> pds = PiecewiseDirectStandardization(window_length=5, n_components=2)
>>> pds.fit(X, X_source=X_source)
PiecewiseDirectStandardization(n_components=2, window_length=5)
>>> X_transformed = pds.transform(X)
>>> X_transformed.shape
(50, 100)

Attributes

`n_features_in_`
`T_`
`coef_band_`
`interior_start_`
`interior_end_`
`bias_`
`x_source_provided_`

n_features_in_: int

T_: ndarray | None

coef_band_: ndarray | None

interior_start_: int

interior_end_: int

bias_: ndarray | None

x_source_provided_: bool

fit(X: ndarray, y=None, *, X_source: ndarray | None = None) → PiecewiseDirectStandardization[源代码]

Fit the PiecewiseDirectStandardization to the input data.

参数:

X (np.ndarray of shape (n_samples, n_features)) -- Data from the target instrument.
y (None) -- Ignored to align with API.
X_source (np.ndarray of shape (n_samples, n_features), optional) -- Data from the source instrument. If None, the transformer defaults to an identity transformation.

返回:

self

返回类型:

PiecewiseDirectStandardization

transform(X) → ndarray[源代码]

Use the trained model to transform the target data

参数:: X (np.ndarray of shape (n_samples, n_features)) -- Input data to transform
返回:: X_transformed -- Data transformed
返回类型:: np.ndarray of shape (n_samples, n_features)

set_fit_request(*, X_source: bool | None | str = '$UNCHANGED$') → PiecewiseDirectStandardization

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

在 1.3 版本加入.

参数:: X_source (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for X_source parameter in fit.
返回:: self -- The updated object.
返回类型:: object