OrthogonalSignalCorrection#

class chemotools.projection.OrthogonalSignalCorrection(n_components: int = 2, method: Literal['wold', 'sjoblom', 'fearn'] = 'wold', max_iter: int = 500, tol: float = 1e-06, copy: bool = True)[ソース]

ベースクラス: DocLinkMixin, TransformerMixin, BaseEstimator

Remove variation in X that is orthogonal to the target y using Orthogonal Signal Correction (OSC) [1] [2] [3].

OSC identifies and removes spectral components that carry substantial variance in X yet have no linear relationship with y. Three algorithmic variants are available:

wold: the original iterative method by Wold et al. (1998), which alternates between constraining the scores to be orthogonal to y and re-estimating loadings until convergence [1].
sjoblom: a modified iterative scheme by Sjöblom et al. (1998) that uses the pseudo-inverse of y to enforce orthogonality, often improving convergence behaviour [2].
fearn: a direct, non-iterative formulation by Fearn (2000) that projects X onto the null space of y before extracting the dominant singular vectors. This avoids convergence issues entirely [3].

The transformer returns the corrected matrix with the same number of features as the input and is intended as a supervised signal correction step prior to calibration.

パラメータ:

n_components (int, default=2) -- Number of orthogonal components to remove. Must be a positive integer.
method ({'wold', 'sjoblom', 'fearn'}, default='wold') -- Method for calculating orthogonal components: - 'wold': Original method by Wold et al. (1998) [1] - 'sjoblom': Method by Sjöblom et al. (1998) [2] - 'fearn': Method by Fearn (2000) [3]
max_iter (int, default=500) -- Maximum number of iterations for the component calculation algorithms.
tol (float, default=1e-06) -- Tolerance for convergence in the iterative algorithms.
copy (bool, default=True) -- Whether to copy X and Y in fit before applying centering.

変数:

mean_X (ndarray of shape (n_features,)) -- The mean of the features in the training data.
mean_y (float or ndarray of shape (n_targets,)) -- The mean of the target variable(s) in the training data.
scores (ndarray of shape (n_samples, n_components)) -- The scores of the orthogonal components.
weights (ndarray of shape (n_features, n_components)) -- The weights of the orthogonal components.
loadings (ndarray of shape (n_features, n_components)) -- The loadings of the orthogonal components.
retained_variance_ratio (float) -- The ratio of variance retained in X after removing the orthogonal components.
removed_variance_ratio (float) -- The ratio of variance removed from X by the orthogonal components.
n_iter (ndarray of shape (n_components,)) -- The number of iterations taken for each component to converge.

参照

サンプル

Fit and apply OSC to remove variation in X that is orthogonal to y.

>>> import numpy as np
>>> from chemotools.projection import OrthogonalSignalCorrection
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(8, 5))
>>> y = np.linspace(0, 1, 8)
>>> osc = OrthogonalSignalCorrection(n_components=1, method="wold")
>>> X_osc = osc.fit_transform(X, y)
>>> X_osc.shape
(8, 5)

Multivariate targets are also supported.

>>> y_multi = np.column_stack([y, y**2])
>>> osc = OrthogonalSignalCorrection(n_components=2, method="fearn")
>>> osc.fit(X, y_multi)
OrthogonalSignalCorrection(method='fearn')

The transformer can be used inside a scikit-learn pipeline.

>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.cross_decomposition import PLSRegression
>>> pipe = make_pipeline(
...     OrthogonalSignalCorrection(n_components=1, method="sjoblom"),
...     PLSRegression(n_components=2),
... )
>>> pipe.fit(X, y)
Pipeline(steps=[('orthogonalsignalcorrection',
                 OrthogonalSignalCorrection(
                     method='sjoblom', n_components=1
                 )),
                ('plsregression', PLSRegression())])

メモ

OSC is a supervised preprocessing method: the target used during fit() must be representative of the calibration problem.

The wold and sjoblom variants use iterative updates and may emit ConvergenceWarning if max_iter is reached before convergence. The fearn variant is non-iterative and does not require convergence tuning.

In practice, a small number of components is usually sufficient. Removing too many orthogonal components may discard structured variation that is still useful for the downstream model.

参考

chemotools.projection.ExternalParameterOrthogonalization: Remove variation linked to external nuisance parameters.
sklearn.pipeline.make_pipeline: Build preprocessing and modelling pipelines.

Initialize the Orthogonal Signal Correction (OSC) transformer.

パラメータ:

n_components (int, default=2) -- Number of orthogonal components to remove. Must be a positive integer.
copy (bool, default=True) -- Whether to copy X and Y in fit before applying centering.

fit(X: ndarray, y: ndarray) → OrthogonalSignalCorrection[ソース]

Fit the OSC model to calculate the orthogonal components to remove.

パラメータ:

X (array-like of shape (n_samples, n_features)) -- Training vectors. Accepts numpy arrays, pandas DataFrames.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) -- Target vectors. Accepts 1D (univariate) or 2D (multivariate) targets.

戻り値:

self -- Fitted OSC model with calculated orthogonal components.

戻り値の型:

OrthogonalSignalCorrection

transform(X: ndarray, y=None)[ソース]

Apply orthogonal signal correction to X.

Projects X onto the latent components found during fitting.

パラメータ:

X (array-like of shape (n_samples, n_features)) -- Samples to transform.
y (None) -- Ignored to align with API.

戻り値:

X_transformed -- X transformed with removed orthogonal variation.

戻り値の型:

ndarray of shape (n_samples, n_features)