OrthogonalSignalCorrection#

class chemotools.projection.OrthogonalSignalCorrection(n_components: int = 2, method: Literal['wold', 'sjoblom', 'fearn'] = 'wold', max_iter: int = 500, tol: float = 1e-06, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

A transformer that removes variation in X that is orthogonal to the target y.

Parameters:
  • n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.

  • method ({'wold', 'sjoblom', 'fearn'}, default='wold') – Method for calculating orthogonal components: - ‘wold’: Original method by Wold et al. (1998) [1] - ‘sjoblom’: Method by Sjöblom et al. (1998) [2] - ‘fearn’: Method by Fearn (2000) [3]

  • max_iter (int, default=500) – Maximum number of iterations for the component calculation algorithms.

  • tol (float, default=1e-06) – Tolerance for convergence in the iterative algorithms.

  • copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.

Variables:
  • mean_X (ndarray of shape (n_features,)) – The mean of the features in the training data.

  • mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable(s) in the training data.

  • scores (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.

  • weights (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.

  • loadings (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.

  • n_iter (ndarray of shape (n_components,)) – The number of iterations taken for each component to converge.

References

Examples

Fit and apply OSC to remove variation in X that is orthogonal to y.

>>> import numpy as np
>>> from chemotools.projection import OrthogonalSignalCorrection
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(8, 5))
>>> y = np.linspace(0, 1, 8)
>>> osc = OrthogonalSignalCorrection(n_components=1, method="wold")
>>> X_osc = osc.fit_transform(X, y)
>>> X_osc.shape
(8, 5)

Multivariate targets are also supported.

>>> y_multi = np.column_stack([y, y**2])
>>> osc = OrthogonalSignalCorrection(n_components=2, method="fearn")
>>> osc.fit(X, y_multi)
OrthogonalSignalCorrection(method='fearn')

The transformer can be used inside a scikit-learn pipeline.

>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.cross_decomposition import PLSRegression
>>> pipe = make_pipeline(
...     OrthogonalSignalCorrection(n_components=1, method="sjoblom"),
...     PLSRegression(n_components=2),
... )
>>> pipe.fit(X, y)
Pipeline(steps=[('orthogonalsignalcorrection',
                 OrthogonalSignalCorrection(
                     method='sjoblom', n_components=1
                 )),
                ('plsregression', PLSRegression())])

Notes

OSC is a supervised preprocessing method: it removes components from X that are orthogonal to the provided target y. Because of this, the target used during fit() must be representative of the calibration problem.

The transformed data keep the same shape as the input data. This estimator is therefore intended for signal correction rather than classical dimension reduction.

The available methods differ in how orthogonal components are estimated:

  • wold and sjoblom use iterative updates and may emit

    ConvergenceWarning if max_iter is reached before convergence.

  • fearn uses a direct SVD-based formulation and does not require an

    iterative loop.

In practice, a small number of components is usually preferred. Removing too many orthogonal components may discard structured variation that is still useful for the downstream model.

See also

chemotools.projection.ExternalParameterOrthogonalization

Remove variation linked to external nuisance parameters.

sklearn.pipeline.make_pipeline

Build preprocessing and modelling pipelines.

Initialize the Orthogonal Signal Correction (OSC) transformer.

Parameters:
  • n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.

  • copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.

fit(X: ndarray, y: ndarray) OrthogonalSignalCorrection[source]

Fit the OSC model to calculate the orthogonal components to remove.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training vectors. Accepts numpy arrays, pandas DataFrames.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target vectors. Accepts 1D (univariate) or 2D (multivariate) targets.

Returns:

self – Fitted OSC model with calculated orthogonal components.

Return type:

OrthogonalSignalCorrection

transform(X: ndarray, y=None)[source]

Apply orthogonal signal correction to X.

Projects X onto the latent components found during fitting.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – X transformed with removed orthogonal variation.

Return type:

ndarray of shape (n_samples, n_features)