ExtendedMultiplicativeScatterCorrection#

class chemotools.scatter.ExtendedMultiplicativeScatterCorrection(method: Literal['mean', 'median'] = 'mean', order: int = 2, reference: ndarray | None = None, weights: ndarray | None = None)[source]

Bases: TransformerMixin, OneToOneFeatureMixin, BaseEstimator

Extended multiplicative scatter correction (EMSC) is a preprocessing technique for removing non linear scatter effects from spectra. It is based on fitting a polynomial regression model to the spectrum using a reference spectrum. The reference spectrum can be the mean or median spectrum of a set of spectra or a selected reerence.

Note that this implementation does not include further extensions of the model using orthogonal subspace models.

Parameters:
  • reference (np.ndarray, optional, default=None) – The reference spectrum to use for the correction. If None, the mean spectrum will be used. The default is None.

  • use_mean (bool, optional, default=True) – Whether to use the mean spectrum as the reference. The default is True.

  • use_median (bool, optional, default=False) – Whether to use the median spectrum as the reference. The default is False.

  • order (int, optional, default=2) – The order of the polynomial to fit to the spectrum. The default is 2.

  • weights (np.ndarray, optional, default=None) – The weights to use for the weighted EMSC. If None, the standard EMSC will be used. The default is None.

Variables:
  • n_features_in (int) – The number of features in the training data.

  • reference (np.ndarray) – The reference spectrum used for the correction.

References

[1] Nils Kristian Afseth, Achim Kohler.

Extended multiplicative signal correction in vibrational spectroscopy, a tutorial, doi:10.1016/j.chemolab.2012.03.004

[2] Valeria Tafintseva et al.

Correcting replicate variation in spectroscopic data by machine learning and model-based pre-processing, doi:10.1016/j.chemolab.2021.104350.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scatter import ExtendedMultiplicativeScatterCorrection
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> # Initialize ExtendedMultiplicativeScatterCorrection
>>> emsc = ExtendedMultiplicativeScatterCorrection()
ExtendedMultiplicativeScatterCorrection()
>>> # Fit and transform the data
>>> X_scaled = emsc.fit_transform(X)

Attributes

ALLOWED_METHODS

ALLOWED_METHODS = ['mean', 'median']
fit(X: ndarray, y=None) ExtendedMultiplicativeScatterCorrection[source]

Fit the transformer to the input data. If no reference is provided, the mean or median spectrum will be calculated from the input data.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to fit the transformer to.

  • y (None) – Ignored to align with API.

Returns:

self – The fitted transformer.

Return type:

ExtendedMultiplicativeScatterCorrection

transform(X: ndarray, y=None) ndarray[source]

Transform the input data by applying the multiplicative scatter correction.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – The transformed data.

Return type:

np.ndarray of shape (n_samples, n_features)