ExtendedMultiplicativeScatterCorrection#
- class chemotools.scatter.ExtendedMultiplicativeScatterCorrection(method: Literal['mean', 'median'] = 'mean', order: int = 2, reference: ndarray | None = None, interferences: ndarray | None = None, weights: ndarray | None = None)[source]
Bases:
OneToOneFeatureMixin,TransformerMixin,BaseEstimatorExtended Multiplicative Scatter Correction (EMSC).
EMSC is a preprocessing technique used to remove non-linear scatter effects and baseline shifts from spectral data. It fits a model consisting of a polynomial baseline, a reference spectrum, and optional interference spectra to each sample.
- Parameters:
method ({"mean", "median"}, default="mean") – The statistic used to calculate the reference spectrum if reference is None.
order (int, default=2) – The order of the polynomial baseline. 0 is a constant offset, 1 is linear, 2 is quadratic, etc.
reference (array-like of shape (n_features,), default=None) – A custom reference spectrum. If provided, method is ignored.
interferences (array-like of shape (n_interferences, n_features), default=None) – Known spectra of chemical interferents (e.g., water, CO2) to be mathematically removed from the signal.
weights (array-like of shape (n_features,), default=None) – Wavelength weights for Weighted EMSC. Useful for de-emphasizing noisy regions of the spectrum.
- Variables:
reference (ndarray of shape (n_features,)) – The reference spectrum used for the correction.
weights (ndarray of shape (n_features,)) – The actual weights applied during fitting.
A (ndarray of shape (n_features, n_components)) – The design matrix used for regression.
Notes
The model for each spectrum $x$ is:
\[x = \sum_{i=0}^{order} c_i \lambda^i + m \cdot x_{ref} + \sum g_j \cdot z_j + \epsilon\]The corrected spectrum is calculated by removing the polynomial baseline and the interferences, then normalizing by the scaling factor $m$:
\[x_{corr} = \frac{x - (\sum c_i \lambda^i + \sum g_j \cdot z_j)}{m}\]References
- fit(X, y=None)[source]
Fit the EMSC model.
- Parameters:
X (array-like of shape (n_samples, n_features)) – The training data.
y (None) – Ignored.
- Returns:
self – Fitted transformer.
- Return type:
- transform(X)[source]
Apply EMSC correction to X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Data to transform.
- Returns:
X_corr – Corrected spectra.
- Return type:
ndarray of shape (n_samples, n_features)