StudentizedResiduals#

class chemotools.outliers.StudentizedResiduals(model: _PLS | Pipeline, confidence=0.95)[source]

Bases: _ModelResidualsBase

Calculate the Studentized Residuals on a _PLS model preditions.

Parameters:
  • model (Union[ModelType, Pipeline]) – A fitted _PLS model or Pipeline ending with such a model

  • confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1)

Variables:
  • estimator (ModelType) – The fitted model of type _BasePCA or _PLS

  • transformer (Optional[Pipeline]) – Preprocessing steps before the model

  • n_features_in (int) – Number of features in the input data

  • n_components (int) – Number of components in the model

  • n_samples (int) – Number of samples used to train the model

  • critical_value (float) – The calculated critical value for outlier detection

fit(X, y=None)[source]

Fit the Studentized Residuals model by computing residuals from the training set. Calculates the critical threshold based on the chosen method.

predict(X, y=None)[source]

Identify outliers in the input data based on Studentized Residuals threshold.

predict_residuals(X, y=None, validate=True)[source]

Calculate Studentized Residuals for input data.

_calculate_critical_value(X)[source]

Calculate the critical value for outlier detection using the specified method.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.outliers import StudentizedResiduals
>>> from sklearn.cross_decomposition import PLSRegression
>>> # Load sample data
>>> X, y = load_fermentation_train()
>>> y = y.values
>>> # Instantiate the PLS model
>>> pls = PLSRegression(n_components=3).fit(X, y)
>>> # Initialize StudentizedResiduals with the fitted PLS model
>>> studentized_residuals = StudentizedResiduals(model=pls, confidence=0.95)
StudentizedResiduals(model=PLSRegression(n_components=3), confidence=0.95)
>>> studentized_residuals.fit(X, y)
>>> # Predict outliers in the dataset
>>> outliers = studentized_residuals.predict(X, y)
>>> # Calculate Studentized residuals
>>> studentized_residuals_stats = studentized_residuals.predict_residuals(X, y)

References

[1] Kim H. Esbensen,

“Multivariate Data Analysis - In Practice”, 5th Edition, 2002.

fit(X: ndarray, y: ndarray | None) StudentizedResiduals[source]

Fit the model to the input data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data

  • y (array-like of shape (n_samples,)) – Target data

Returns:

self – Fitted estimator with the critical threshold computed

Return type:

StudentizedResiduals

predict(X: ndarray, y: ndarray | None = None) ndarray[source]

Calculate studentized residuals in the model predictions. and return a boolean array indicating outliers.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data

  • y (array-like of shape (n_samples,)) – Target data

Returns:

Studentized residuals of the predictions

Return type:

ndarray of shape (n_samples,)

predict_residuals(X: ndarray, y: ndarray | None, validate: bool = True) ndarray[source]

Calculate the studentized residuals of the model predictions.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data

  • y (array-like of shape (n_samples,)) – Target values

Returns:

Studentized residuals of the model predictions

Return type:

ndarray of shape (n_samples,)