StudentizedResiduals#
- class chemotools.outliers.StudentizedResiduals(model: _PLS | Pipeline, confidence=0.95)[source]
Bases:
_ModelResidualsBaseCalculate the Studentized Residuals on a _PLS model preditions.
- Parameters:
model (Union[ModelType, Pipeline]) – A fitted _PLS model or Pipeline ending with such a model
confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1)
- Variables:
estimator (ModelType) – The fitted model of type _BasePCA or _PLS
transformer (Optional[Pipeline]) – Preprocessing steps before the model
n_features_in (int) – Number of features in the input data
n_components (int) – Number of components in the model
n_samples (int) – Number of samples used to train the model
critical_value (float) – The calculated critical value for outlier detection
- fit(X, y=None)[source]
Fit the Studentized Residuals model by computing residuals from the training set. Calculates the critical threshold based on the chosen method.
- predict(X, y=None)[source]
Identify outliers in the input data based on Studentized Residuals threshold.
- predict_residuals(X, y=None, validate=True)[source]
Calculate Studentized Residuals for input data.
- _calculate_critical_value(X)[source]
Calculate the critical value for outlier detection using the specified method.
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.outliers import StudentizedResiduals >>> from sklearn.cross_decomposition import PLSRegression >>> # Load sample data >>> X, y = load_fermentation_train() >>> y = y.values >>> # Instantiate the PLS model >>> pls = PLSRegression(n_components=3).fit(X, y) >>> # Initialize StudentizedResiduals with the fitted PLS model >>> studentized_residuals = StudentizedResiduals(model=pls, confidence=0.95) StudentizedResiduals(model=PLSRegression(n_components=3), confidence=0.95) >>> studentized_residuals.fit(X, y) >>> # Predict outliers in the dataset >>> outliers = studentized_residuals.predict(X, y) >>> # Calculate Studentized residuals >>> studentized_residuals_stats = studentized_residuals.predict_residuals(X, y)
References
- [1] Kim H. Esbensen,
“Multivariate Data Analysis - In Practice”, 5th Edition, 2002.
- fit(X: ndarray, y: ndarray | None) StudentizedResiduals[source]
Fit the model to the input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (array-like of shape (n_samples,)) – Target data
- Returns:
self – Fitted estimator with the critical threshold computed
- Return type:
StudentizedResiduals
- predict(X: ndarray, y: ndarray | None = None) ndarray[source]
Calculate studentized residuals in the model predictions. and return a boolean array indicating outliers.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (array-like of shape (n_samples,)) – Target data
- Returns:
Studentized residuals of the predictions
- Return type:
ndarray of shape (n_samples,)
- predict_residuals(X: ndarray, y: ndarray | None, validate: bool = True) ndarray[source]
Calculate the studentized residuals of the model predictions.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (array-like of shape (n_samples,)) – Target values
- Returns:
Studentized residuals of the model predictions
- Return type:
ndarray of shape (n_samples,)