OrthogonalSignalCorrection#
- class chemotools.projection.OrthogonalSignalCorrection(n_components: int = 2, method: Literal['wold', 'sjoblom', 'fearn'] = 'wold', max_iter: int = 500, tol: float = 1e-06, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorRemove variation in X that is orthogonal to the target y using Orthogonal Signal Correction (OSC) [1] [2] [3].
OSC identifies and removes spectral components that carry substantial variance in X yet have no linear relationship with y. Three algorithmic variants are available:
wold: the original iterative method by Wold et al. (1998), which alternates between constraining the scores to be orthogonal to y and re-estimating loadings until convergence [1].
sjoblom: a modified iterative scheme by Sjöblom et al. (1998) that uses the pseudo-inverse of y to enforce orthogonality, often improving convergence behaviour [2].
fearn: a direct, non-iterative formulation by Fearn (2000) that projects X onto the null space of y before extracting the dominant singular vectors. This avoids convergence issues entirely [3].
The transformer returns the corrected matrix with the same number of features as the input and is intended as a supervised signal correction step prior to calibration.
- Parameters:
n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.
method ({'wold', 'sjoblom', 'fearn'}, default='wold') – Method for calculating orthogonal components: - ‘wold’: Original method by Wold et al. (1998) [1] - ‘sjoblom’: Method by Sjöblom et al. (1998) [2] - ‘fearn’: Method by Fearn (2000) [3]
max_iter (int, default=500) – Maximum number of iterations for the component calculation algorithms.
tol (float, default=1e-06) – Tolerance for convergence in the iterative algorithms.
copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.
- Variables:
mean_X (ndarray of shape (n_features,)) – The mean of the features in the training data.
mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable(s) in the training data.
scores (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.
weights (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.
loadings (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.
retained_variance_ratio (float) – The ratio of variance retained in X after removing the orthogonal components.
removed_variance_ratio (float) – The ratio of variance removed from X by the orthogonal components.
n_iter (ndarray of shape (n_components,)) – The number of iterations taken for each component to converge.
References
Examples
Fit and apply OSC to remove variation in X that is orthogonal to y.
>>> import numpy as np >>> from chemotools.projection import OrthogonalSignalCorrection >>> rng = np.random.default_rng(0) >>> X = rng.normal(size=(8, 5)) >>> y = np.linspace(0, 1, 8) >>> osc = OrthogonalSignalCorrection(n_components=1, method="wold") >>> X_osc = osc.fit_transform(X, y) >>> X_osc.shape (8, 5)
Multivariate targets are also supported.
>>> y_multi = np.column_stack([y, y**2]) >>> osc = OrthogonalSignalCorrection(n_components=2, method="fearn") >>> osc.fit(X, y_multi) OrthogonalSignalCorrection(method='fearn')
The transformer can be used inside a scikit-learn pipeline.
>>> from sklearn.pipeline import make_pipeline >>> from sklearn.cross_decomposition import PLSRegression >>> pipe = make_pipeline( ... OrthogonalSignalCorrection(n_components=1, method="sjoblom"), ... PLSRegression(n_components=2), ... ) >>> pipe.fit(X, y) Pipeline(steps=[('orthogonalsignalcorrection', OrthogonalSignalCorrection( method='sjoblom', n_components=1 )), ('plsregression', PLSRegression())])
Notes
OSC is a supervised preprocessing method: the target used during
fit()must be representative of the calibration problem.The
woldandsjoblomvariants use iterative updates and may emitConvergenceWarningifmax_iteris reached before convergence. Thefearnvariant is non-iterative and does not require convergence tuning.In practice, a small number of components is usually sufficient. Removing too many orthogonal components may discard structured variation that is still useful for the downstream model.
See also
chemotools.projection.ExternalParameterOrthogonalizationRemove variation linked to external nuisance parameters.
sklearn.pipeline.make_pipelineBuild preprocessing and modelling pipelines.
Initialize the Orthogonal Signal Correction (OSC) transformer.
- Parameters:
- fit(X: ndarray, y: ndarray) OrthogonalSignalCorrection[source]
Fit the OSC model to calculate the orthogonal components to remove.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training vectors. Accepts numpy arrays, pandas DataFrames.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target vectors. Accepts 1D (univariate) or 2D (multivariate) targets.
- Returns:
self – Fitted OSC model with calculated orthogonal components.
- Return type:
OrthogonalSignalCorrection
- transform(X: ndarray, y=None)[source]
Apply orthogonal signal correction to X.
Projects X onto the latent components found during fitting.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to transform.
y (None) – Ignored to align with API.
- Returns:
X_transformed – X transformed with removed orthogonal variation.
- Return type:
ndarray of shape (n_samples, n_features)