OrthogonalSignalCorrection#
- class chemotools.projection.OrthogonalSignalCorrection(n_components: int = 2, method: Literal['wold', 'sjoblom', 'fearn'] = 'wold', max_iter: int = 500, tol: float = 1e-06, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorA transformer that removes variation in X that is orthogonal to the target y.
- Parameters:
n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.
method ({'wold', 'sjoblom', 'fearn'}, default='wold') – Method for calculating orthogonal components: - ‘wold’: Original method by Wold et al. (1998) [1] - ‘sjoblom’: Method by Sjöblom et al. (1998) [2] - ‘fearn’: Method by Fearn (2000) [3]
max_iter (int, default=500) – Maximum number of iterations for the component calculation algorithms.
tol (float, default=1e-06) – Tolerance for convergence in the iterative algorithms.
copy (bool, default=True) – Whether to copy X and Y in fit before applying centering.
- Variables:
mean_X (ndarray of shape (n_features,)) – The mean of the features in the training data.
mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable(s) in the training data.
scores (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.
weights (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.
loadings (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.
n_iter (ndarray of shape (n_components,)) – The number of iterations taken for each component to converge.
References
Examples
Fit and apply OSC to remove variation in X that is orthogonal to y.
>>> import numpy as np >>> from chemotools.projection import OrthogonalSignalCorrection >>> rng = np.random.default_rng(0) >>> X = rng.normal(size=(8, 5)) >>> y = np.linspace(0, 1, 8) >>> osc = OrthogonalSignalCorrection(n_components=1, method="wold") >>> X_osc = osc.fit_transform(X, y) >>> X_osc.shape (8, 5)
Multivariate targets are also supported.
>>> y_multi = np.column_stack([y, y**2]) >>> osc = OrthogonalSignalCorrection(n_components=2, method="fearn") >>> osc.fit(X, y_multi) OrthogonalSignalCorrection(method='fearn')
The transformer can be used inside a scikit-learn pipeline.
>>> from sklearn.pipeline import make_pipeline >>> from sklearn.cross_decomposition import PLSRegression >>> pipe = make_pipeline( ... OrthogonalSignalCorrection(n_components=1, method="sjoblom"), ... PLSRegression(n_components=2), ... ) >>> pipe.fit(X, y) Pipeline(steps=[('orthogonalsignalcorrection', OrthogonalSignalCorrection( method='sjoblom', n_components=1 )), ('plsregression', PLSRegression())])
Notes
OSC is a supervised preprocessing method: it removes components from X that are orthogonal to the provided target y. Because of this, the target used during fit() must be representative of the calibration problem.
The transformed data keep the same shape as the input data. This estimator is therefore intended for signal correction rather than classical dimension reduction.
The available methods differ in how orthogonal components are estimated:
- wold and sjoblom use iterative updates and may emit
ConvergenceWarning if max_iter is reached before convergence.
- fearn uses a direct SVD-based formulation and does not require an
iterative loop.
In practice, a small number of components is usually preferred. Removing too many orthogonal components may discard structured variation that is still useful for the downstream model.
See also
chemotools.projection.ExternalParameterOrthogonalizationRemove variation linked to external nuisance parameters.
sklearn.pipeline.make_pipelineBuild preprocessing and modelling pipelines.
Initialize the Orthogonal Signal Correction (OSC) transformer.
- Parameters:
- fit(X: ndarray, y: ndarray) OrthogonalSignalCorrection[source]
Fit the OSC model to calculate the orthogonal components to remove.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Training vectors. Accepts numpy arrays, pandas DataFrames.
y (array-like of shape (n_samples,) or (n_samples, n_targets)) – Target vectors. Accepts 1D (univariate) or 2D (multivariate) targets.
- Returns:
self – Fitted OSC model with calculated orthogonal components.
- Return type:
OrthogonalSignalCorrection
- transform(X: ndarray, y=None)[source]
Apply orthogonal signal correction to X.
Projects X onto the latent components found during fitting.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Samples to transform.
y (None) – Ignored to align with API.
- Returns:
X_transformed – X transformed with removed orthogonal variation.
- Return type:
ndarray of shape (n_samples, n_features)