DModX#
- class chemotools.outliers.DModX(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95, mean_centered: bool = True)[source]
Bases:
_ModelResidualsBaseCalculate Distance to Model (DModX) statistics.
DModX measures the distance between an observation and the model plane in the X-space, useful for detecting outliers.
- Parameters:
- Variables:
estimator (ModelType) – The fitted model of type _BasePCA or _PLS
transformer (Optional[Pipeline]) – Preprocessing steps before the model
n_features_in (int) – Number of features in the input data
n_components (int) – Number of components in the model
n_samples (int) – Number of samples used to train the model
critical_value (float) – The calculated critical value for outlier detection
train_sse (float) – The training sum of squared errors (SSE) for the model normalized by degrees of freedom
A0 (int) – Adjustment factor for degrees of freedom based on mean centering
References
- [1] Max Bylesjö, Mattias Rantalainen, Oliver Cloarec, Johan K. Nicholson,
Elaine Holmes, Johan Trygg. “OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification.” Journal of Chemometrics 20 (8-10), 341-351 (2006).
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.outliers import DModX >>> from sklearn.decomposition import PCA >>> # Load sample data >>> X, _ = load_fermentation_train() >>> # Instantiate the PCA model >>> pca = PCA(n_components=3).fit(X) >>> # Initialize DModX with the fitted PCA model >>> dmodx = DModX(model=pca, confidence=0.95, mean_centered=True) DModX(model=PCA(n_components=3), confidence=0.95, mean_centered=True) >>> dmodx.fit(X) >>> # Predict outliers in the dataset >>> outliers = dmodx.predict(X) >>> # Calculate DModX residuals >>> residuals = dmodx.predict_residuals(X)
Attributes
critical_value_