QResiduals#
- class chemotools.outliers.QResiduals(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95, method: Literal['chi-square', 'jackson-mudholkar', 'percentile'] = 'jackson-mudholkar')[source]
Bases:
_ModelResidualsBaseCalculate Q residuals (Squared Prediction Error - SPE) for PCA or PLS models.
- Parameters:
model (Union[ModelType, Pipeline]) – A fitted PCA/PLS model or Pipeline ending with such a model.
confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1).
method (str, default="jackson-mudholkar") – The method used to compute the confidence threshold for Q residuals. Options: - “chi-square” : Uses the first two moments of the residual eigenvalues (mean and variance) to compute a moment-matched chi-square threshold for Q residuals [1, 3]. - “jackson-mudholkar” : Uses the first three moments of the residual eigenvalues to calculate an analytical threshold based on Jackson & Mudholkar’s approximation [2, 3]. - “percentile” : Uses the empirical percentile of the observed Q residuals to set a non-parametric threshold.
- Variables:
estimator (ModelType) – The fitted model of type _BasePCA or _PLS.
transformer (Optional[Pipeline]) – Preprocessing steps before the model.
n_features_in (int) – Number of features in the input data.
n_components (int) – Number of components in the model.
n_samples (int) – Number of samples used to train the model.
critical_value (float) – The calculated critical value for outlier detection.
- fit(X, y=None)
Fit the Q Residuals model by computing residuals from the training set. Calculates the critical threshold based on the chosen method.
- predict(X)
Identify outliers in the input data based on Q residuals threshold.
- predict_residuals(X, y=None, validate=True)
Calculate Q residuals (Squared Prediction Error - SPE) for input data.
- _calculate_critical_value(X)
Calculate the critical value for outlier detection using the specified method.
References
- [1] Box, G. E. P. (1954).
Some theorems on quadratic forms applied in the study of analysis of variance problems, I. Effect of inequality of variance in the one-way classification. Annals of Mathematical Statistics, 25(2), 290–302.
- [2] Jackson, J. E., & Mudholkar, G. S. (1979).
Control procedures for residuals associated with principal component analysis. Technometrics, 21(3), 341–349.
- [3] Johan A. Westerhuis, Stephen P. Gurden, Age K. Smilde (2001)
Generalized contribution plots in multivariate statistical process monitoring Chemometrics and Intelligent Laboratory Systems 51 95–114 (2000)
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.outliers import QResiduals >>> from sklearn.decomposition import PCA >>> X, _ = load_fermentation_train() >>> pca = PCA(n_components=3).fit(X) >>> # Initialize QResiduals with the fitted PCA model >>> q_residuals = QResiduals(model=pca, confidence=0.95) >>> q_residuals.fit(X) >>> # Predict outliers in the dataset >>> outliers = q_residuals.predict(X) >>> # Calculate Q-residuals >>> q_residuals_stats = q_residuals.predict_residuals(X)
Attributes
critical_value_