QResiduals#

class chemotools.outliers.QResiduals(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95, method: Literal['chi-square', 'jackson-mudholkar', 'percentile'] = 'jackson-mudholkar')[source]

Bases: _ModelResidualsBase

Calculate Q residuals (Squared Prediction Error - SPE) for PCA or PLS models.

Parameters:
  • model (Union[ModelType, Pipeline]) – A fitted PCA/PLS model or Pipeline ending with such a model.

  • confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1).

  • method (str, default="jackson-mudholkar") – The method used to compute the confidence threshold for Q residuals. Options: - “chi-square” : Uses mean and standard deviation to approximate Q residuals threshold. - “jackson-mudholkar” : Uses eigenvalue-based analytical approximation. - “percentile” : Uses empirical percentile threshold.

Variables:
  • estimator (ModelType) – The fitted model of type _BasePCA or _PLS.

  • transformer (Optional[Pipeline]) – Preprocessing steps before the model.

  • n_features_in (int) – Number of features in the input data.

  • n_components (int) – Number of components in the model.

  • n_samples (int) – Number of samples used to train the model.

  • critical_value (float) – The calculated critical value for outlier detection.

fit(X, y=None)[source]

Fit the Q Residuals model by computing residuals from the training set. Calculates the critical threshold based on the chosen method.

predict(X)[source]

Identify outliers in the input data based on Q residuals threshold.

predict_residuals(X, y=None, validate=True)[source]

Calculate Q residuals (Squared Prediction Error - SPE) for input data.

_calculate_critical_value(X)[source]

Calculate the critical value for outlier detection using the specified method.

References

[1] Johan A. Westerhuis, Stephen P. Gurden, Age K. Smilde (2001)

Generalized contribution plots in multivariate statistical process monitoring Chemometrics and Intelligent Laboratory Systems 51 95–114 (2000)

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.outliers import QResiduals
>>> from sklearn.decomposition import PCA
>>> X, _ = load_fermentation_train()
>>> pca = PCA(n_components=3).fit(X)
>>> # Initialize QResiduals with the fitted PCA model
>>> q_residuals = QResiduals(model=pca, confidence=0.95)
>>> q_residuals.fit(X)
>>> # Predict outliers in the dataset
>>> outliers = q_residuals.predict(X)
>>> # Calculate Q-residuals
>>> q_residuals_stats = q_residuals.predict_residuals(X)
fit(X: ndarray, y: ndarray | None = None) QResiduals[source]

Fit the Q Residuals model by computing residuals from the training set.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training data.

Returns:

self – Fitted instance of QResiduals.

Return type:

object

predict(X: ndarray, y: ndarray | None = None) ndarray[source]

Identify outliers in the input data based on Q residuals threshold.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data.

Returns:

Boolean array indicating outliers (-1 for outliers, 1 for normal data).

Return type:

ndarray of shape (n_samples,)

predict_residuals(X: ndarray, y: ndarray | None = None, validate: bool = True) ndarray[source]

Calculate Q residuals (Squared Prediction Error - SPE) for input data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data.

  • validate (bool, default=True) – Whether to validate the input data.

Returns:

Q residuals for each sample.

Return type:

ndarray of shape (n_samples,)