ExtendedMultiplicativeScatterCorrection#

class chemotools.scatter.ExtendedMultiplicativeScatterCorrection(method: Literal['mean', 'median'] = 'mean', order: int = 2, reference: ndarray | None = None, interferences: ndarray | None = None, weights: ndarray | None = None)[source]

Bases: OneToOneFeatureMixin, TransformerMixin, BaseEstimator

Extended Multiplicative Scatter Correction (EMSC).

EMSC is a preprocessing technique used to remove non-linear scatter effects and baseline shifts from spectral data. It fits a model consisting of a polynomial baseline, a reference spectrum, and optional interference spectra to each sample.

Parameters:
  • method ({"mean", "median"}, default="mean") – The statistic used to calculate the reference spectrum if reference is None.

  • order (int, default=2) – The order of the polynomial baseline. 0 is a constant offset, 1 is linear, 2 is quadratic, etc.

  • reference (array-like of shape (n_features,), default=None) – A custom reference spectrum. If provided, method is ignored.

  • interferences (array-like of shape (n_interferences, n_features), default=None) – Known spectra of chemical interferents (e.g., water, CO2) to be mathematically removed from the signal.

  • weights (array-like of shape (n_features,), default=None) – Wavelength weights for Weighted EMSC. Useful for de-emphasizing noisy regions of the spectrum.

Variables:
  • reference (ndarray of shape (n_features,)) – The reference spectrum used for the correction.

  • weights (ndarray of shape (n_features,)) – The actual weights applied during fitting.

  • A (ndarray of shape (n_features, n_components)) – The design matrix used for regression.

  • n_features_in (int) – Number of features seen during fit.

Notes

The model for each spectrum $x$ is:

\[x = \sum_{i=0}^{order} c_i \lambda^i + m \cdot x_{ref} + \sum g_j \cdot z_j + \epsilon\]

The corrected spectrum is calculated by removing the polynomial baseline and the interferences, then normalizing by the scaling factor $m$:

\[x_{corr} = \frac{x - (\sum c_i \lambda^i + \sum g_j \cdot z_j)}{m}\]

References

fit(X, y=None)[source]

Fit the EMSC model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The training data.

  • y (None) – Ignored.

Returns:

self – Fitted transformer.

Return type:

object

transform(X)[source]

Apply EMSC correction to X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Data to transform.

Returns:

X_corr – Corrected spectra.

Return type:

ndarray of shape (n_samples, n_features)