DModX#
- class chemotools.outliers.DModX(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95, mean_centered: bool = True)[source]
Bases:
_ModelResidualsBaseCalculate Distance to Model (DModX) statistics.
DModX measures the distance between an observation and the model plane in the X-space, useful for detecting outliers.
- Parameters:
- Variables:
estimator (ModelType) – The fitted model of type _BasePCA or _PLS
transformer (Optional[Pipeline]) – Preprocessing steps before the model
n_features_in (int) – Number of features in the input data
n_components (int) – Number of components in the model
n_samples (int) – Number of samples used to train the model
critical_value (float) – The calculated critical value for outlier detection
train_sse (float) – The training sum of squared errors (SSE) for the model normalized by degrees of freedom
A0 (int) – Adjustment factor for degrees of freedom based on mean centering
References
- [1] Max Bylesjö, Mattias Rantalainen, Oliver Cloarec, Johan K. Nicholson,
Elaine Holmes, Johan Trygg. “OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification.” Journal of Chemometrics 20 (8-10), 341-351 (2006).
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.outliers import DModX >>> from sklearn.decomposition import PCA >>> # Load sample data >>> X, _ = load_fermentation_train() >>> # Instantiate the PCA model >>> pca = PCA(n_components=3).fit(X) >>> # Initialize DModX with the fitted PCA model >>> dmodx = DModX(model=pca, confidence=0.95, mean_centered=True) DModX(model=PCA(n_components=3), confidence=0.95, mean_centered=True) >>> dmodx.fit(X) >>> # Predict outliers in the dataset >>> outliers = dmodx.predict(X) >>> # Calculate DModX residuals >>> residuals = dmodx.predict_residuals(X)
- fit(X: ndarray, y: ndarray | None = None) DModX[source]
Fit the model and compute training residual variance.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data used to fit the model.
y (None) – Ignored to align with API.
- Returns:
self – Fitted estimator with computed training residuals and critical value.
- Return type:
DModX
- predict(X: ndarray, y: ndarray | None = None) ndarray[source]
Identify outliers in the input data.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to predict outliers for.
y (None) – Ignored to align with API.
- Returns:
outliers – Array indicating outliers (-1) and inliers (1).
- Return type:
np.ndarray of shape (n_samples,)
- predict_residuals(X: ndarray, y: ndarray | None = None, validate: bool = True) ndarray[source]
Calculate normalized DModX statistics for input data.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to calculate DModX statistics for.
y (None) – Ignored.
validate (bool, default=True) – If True, validate the input data.
- Returns:
dmodx_values – The normalized DModX statistics for each sample.
- Return type:
np.ndarray of shape (n_samples,)