ExternalParameterOrthogonalization#

class chemotools.projection.ExternalParameterOrthogonalization(n_components: int = 2, copy: bool = True)[source]

Bases: DocLinkMixin, TransformerMixin, BaseEstimator

Remove variation linked to known external nuisance parameters using External Parameter Orthogonalization (EPO) [1].

EPO is designed for situations where spectral measurements are affected by controlled external factors (e.g., temperature, humidity, instrument differences) that are not related to the target property. The method estimates a nuisance subspace from an auxiliary dataset or from structured replicates in which the external parameter varies while the underlying sample composition is held constant.

A matrix capturing this external variation is constructed (e.g., from differences or deviations within replicate groups), and its dominant components are obtained via SVD/PCA. These components define a subspace associated with the external parameter. A projection operator is then applied to X to remove variation in this subspace.

When sample_ids are provided, the external-effect matrix is formed from within-sample deviations (i.e., each spectrum minus its sample mean), isolating variation due to the external parameter from chemical variation.

The transformer preserves the original number of features and is intended as a signal correction step rather than dimensionality reduction.

Parameters:

n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.
copy (bool, default=True) – Placeholder argument kept for API compatibility with scikit-learn style estimators. Input validation currently relies on the default behavior of the underlying validation utilities.

Variables:

mean_X (ndarray of shape (n_features,)) – Mean spectrum computed from the calibration data passed to fit().
V_epo (ndarray of shape (n_features, n_components)) – Nuisance directions whose subspace is removed during transform(). These are the first n_components right singular vectors of the external variation matrix, stored column-wise. The implicit projection matrix is I - V_epo_ @ V_epo_.T, but it is never materialised.
n_features_in (int) – Number of features seen during fit().

References

Examples

>>> import numpy as np
>>> from chemotools.projection import (
...     ExternalParameterOrthogonalization,
... )
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(6, 4))
>>> X_external = X + 0.2 * rng.normal(size=(6, 4))
>>> epo = ExternalParameterOrthogonalization(n_components=1)
>>> X_epo = epo.fit_transform(X, X_external=X_external)
>>> X_epo.shape
(6, 4)

Repeated measurements of the same physical sample can be grouped through sample_ids so that the nuisance subspace is estimated from within-sample differences only.

>>> sample_ids = np.array([0, 0, 1, 1, 2, 2])
>>> epo = ExternalParameterOrthogonalization(n_components=1)
>>> epo.fit(X, X_external=X_external, sample_ids=sample_ids)
ExternalParameterOrthogonalization(n_components=1)

Notes

EPO is commonly used when spectral measurements are affected by known nuisance sources such as temperature, instrument transfer, humidity, or acquisition conditions. The nuisance structure is estimated from X_external, then projected out from X.

If sample_ids are provided, the difference matrix is built from deviations around the mean spectrum of each repeated sample. This isolates variation due to the external condition while suppressing the underlying chemical signal.

The transformer preserves the original number of features. It performs signal correction, not dimensionality reduction.