OrthogonalSignalCorrection#

class chemotools.projection.OrthogonalSignalCorrection(n_components: int = 2, method: Literal['wold', 'sjoblom', 'fearn'] = 'wold', max_iter: int = 500, tol: float = 1e-06, copy: bool = True)[source]

Bases: TransformerMixin, BaseEstimator

Remove variation in X that is orthogonal to the target y using Orthogonal Signal Correction (OSC) [1] [2] [3].

OSC identifies and removes spectral components that carry substantial variance in X yet have no linear relationship with y. Three algorithmic variants are available:

  • wold: the original iterative method by Wold et al. (1998), which alternates between constraining the scores to be orthogonal to y and re-estimating loadings until convergence [1].

  • sjoblom: a modified iterative scheme by Sjöblom et al. (1998) that uses the pseudo-inverse of y to enforce orthogonality, often improving convergence behaviour [2].

  • fearn: a direct, non-iterative formulation by Fearn (2000) that projects X onto the null space of y before extracting the dominant singular vectors. This avoids convergence issues entirely [3].

The transformer returns the corrected matrix with the same number of features as the input and is intended as a supervised signal correction step prior to calibration.

Parameters:
  • n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.

  • method ({'wold', 'sjoblom', 'fearn'}, default='wold') – Method for calculating orthogonal components: - ‘wold’: Original method by Wold et al. (1998) [1] - ‘sjoblom’: Method by Sjöblom et al. (1998) [2] - ‘fearn’: Method by Fearn (2000) [3]

  • max_iter (int, default=500) – Maximum number of iterations for the component calculation algorithms.

  • tol (float, default=1e-06) – Tolerance for convergence in the iterative algorithms.

  • copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.

Variables:
  • mean_X (ndarray of shape (n_features,)) – The mean of the features in the training data.

  • mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable(s) in the training data.

  • scores (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.

  • weights (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.

  • loadings (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.

  • retained_variance_ratio (float) – The ratio of variance retained in X after removing the orthogonal components.

  • removed_variance_ratio (float) – The ratio of variance removed from X by the orthogonal components.

  • n_iter (ndarray of shape (n_components,)) – The number of iterations taken for each component to converge.

References

Examples

Fit and apply OSC to remove variation in X that is orthogonal to y.

>>> import numpy as np
>>> from chemotools.projection import OrthogonalSignalCorrection
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(8, 5))
>>> y = np.linspace(0, 1, 8)
>>> osc = OrthogonalSignalCorrection(n_components=1, method="wold")
>>> X_osc = osc.fit_transform(X, y)
>>> X_osc.shape
(8, 5)

Multivariate targets are also supported.

>>> y_multi = np.column_stack([y, y**2])
>>> osc = OrthogonalSignalCorrection(n_components=2, method="fearn")
>>> osc.fit(X, y_multi)
OrthogonalSignalCorrection(method='fearn')

The transformer can be used inside a scikit-learn pipeline.

>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.cross_decomposition import PLSRegression
>>> pipe = make_pipeline(
...     OrthogonalSignalCorrection(n_components=1, method="sjoblom"),
...     PLSRegression(n_components=2),
... )
>>> pipe.fit(X, y)
Pipeline(steps=[('orthogonalsignalcorrection',
                 OrthogonalSignalCorrection(
                     method='sjoblom', n_components=1
                 )),
                ('plsregression', PLSRegression())])

Notes

OSC is a supervised preprocessing method: the target used during fit() must be representative of the calibration problem.

The wold and sjoblom variants use iterative updates and may emit ConvergenceWarning if max_iter is reached before convergence. The fearn variant is non-iterative and does not require convergence tuning.

In practice, a small number of components is usually sufficient. Removing too many orthogonal components may discard structured variation that is still useful for the downstream model.

See also

chemotools.projection.ExternalParameterOrthogonalization

Remove variation linked to external nuisance parameters.

sklearn.pipeline.make_pipeline

Build preprocessing and modelling pipelines.

Initialize the Orthogonal Signal Correction (OSC) transformer.

Parameters:
  • n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.

  • copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.

fit(X: ndarray, y: ndarray) OrthogonalSignalCorrection[source]

Fit the OSC model to calculate the orthogonal components to remove.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training vectors. Accepts numpy arrays, pandas DataFrames.

  • y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target vectors. Accepts 1D (univariate) or 2D (multivariate) targets.

Returns:

self – Fitted OSC model with calculated orthogonal components.

Return type:

OrthogonalSignalCorrection

transform(X: ndarray, y=None)[source]

Apply orthogonal signal correction to X.

Projects X onto the latent components found during fitting.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Samples to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – X transformed with removed orthogonal variation.

Return type:

ndarray of shape (n_samples, n_features)