ExternalParameterOrthogonalization#
- class chemotools.projection.ExternalParameterOrthogonalization(n_components: int = 2, copy: bool = True)[source]
Bases:
TransformerMixin,BaseEstimatorA transformer that removes variation linked to external nuisance parameters.
- Parameters:
n_components (int, default=2) – Number of orthogonal components to remove. Must be a positive integer.
copy (bool, default=True) – Placeholder argument kept for API compatibility with scikit-learn style estimators. Input validation currently relies on the default behavior of the underlying validation utilities.
- Variables:
mean_X (ndarray of shape (n_features,)) – Mean spectrum computed from the calibration data passed to fit().
P_epo (ndarray of shape (n_features, n_features)) – Orthogonal projection matrix used to suppress nuisance variation. Applying X_centered @ P_epo_ removes the subspace spanned by the first n_components singular vectors of the external variation matrix.
n_features_in (int) – Number of features seen during fit().
References
Examples
>>> import numpy as np >>> from chemotools.projection import ( ... ExternalParameterOrthogonalization, ... ) >>> rng = np.random.default_rng(0) >>> X = rng.normal(size=(6, 4)) >>> X_external = X + 0.2 * rng.normal(size=(6, 4)) >>> epo = ExternalParameterOrthogonalization(n_components=1) >>> X_epo = epo.fit_transform(X, X_external=X_external) >>> X_epo.shape (6, 4)
Repeated measurements of the same physical sample can be grouped through sample_ids so that the nuisance subspace is estimated from within-sample differences only.
>>> sample_ids = np.array([0, 0, 1, 1, 2, 2]) >>> epo = ExternalParameterOrthogonalization(n_components=1) >>> epo.fit(X, X_external=X_external, sample_ids=sample_ids) ExternalParameterOrthogonalization(n_components=1)
Notes
EPO is commonly used when spectral measurements are affected by known nuisance sources such as temperature, instrument transfer, humidity, or acquisition conditions. The nuisance structure is estimated from X_external, then projected out from X.
If sample_ids are provided, the difference matrix is built from deviations around the mean spectrum of each repeated sample. This isolates variation due to the external condition while suppressing the underlying chemical signal.
The transformer preserves the original number of features. It performs signal correction, not dimensionality reduction.
See also
chemotools.projection.OrthogonalSignalCorrectionRemove variation orthogonal to a supervised target.
sklearn.pipeline.make_pipelineCompose EPO with downstream estimators.
Initialize the External Parameter Orthogonalization (EPO) transformer.
- Parameters:
- fit(X: ndarray, y=None, X_external: ndarray | None = None, sample_ids: ndarray | None = None)[source]
Fit the EPO projection from calibration and nuisance spectra.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Calibration spectra that will later be corrected by the learned EPO projection.
y (None, default=None) – Ignored. Present for scikit-learn API compatibility.
X_external (array-like of shape (n_samples, n_features)) – Spectra describing the nuisance variation to remove. These may be the same samples measured under perturbed external conditions, transfer standards, or any dataset representative of the unwanted subspace.
sample_ids (array-like of shape (n_samples,), default=None) – Optional identifiers linking repeated measurements of the same sample. When provided, the nuisance difference matrix is computed within each sample group, which helps isolate external variation from chemical differences between samples.
- Returns:
self – Fitted estimator storing the projection matrix in P_epo_.
- Return type:
ExternalParameterOrthogonalization
Notes
The nuisance variation matrix $D$ is constructed as either centered X_external or within-group deviations when sample_ids are available. A singular value decomposition of $D$ yields the dominant nuisance directions, and the projection matrix is then defined as $P = I - VV^T$.
- transform(X: ndarray)[source]
Project spectra onto the subspace orthogonal to nuisance variation.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Spectra to correct.
- Returns:
Corrected spectra after centering with mean_X_ and projection with P_epo_.
- Return type:
ndarray of shape (n_samples, n_features)
- set_fit_request(*, X_external: bool | None | str = '$UNCHANGED$', sample_ids: bool | None | str = '$UNCHANGED$') ExternalParameterOrthogonalization
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- Returns:
self – The updated object.
- Return type: