Leverage#

class chemotools.outliers.Leverage(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95)[source]

Bases: _ModelResidualsBase

Calculate the leverage of the training samples on the latent space of a PLS model. This method allows to detect datapoints with high leverage in the model.

Parameters:
  • model (Union[ModelType, Pipeline]) – A fitted PLSRegression model or Pipeline ending with such a model

  • confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1)

Variables:
  • estimator (ModelType) – The fitted model of type _PLS

  • transformer (Optional[Pipeline]) – Preprocessing steps before the model

  • n_features_in (int) – Number of features in the input data

  • n_components (int) – Number of components in the model

  • n_samples (int) – Number of samples used to train the model

  • critical_value (float) – The calculated critical value for outlier detection

References

[1] Kim H. Esbensen,

“Multivariate Data Analysis - In Practice”, 5th Edition, 2002.

Examples

>>> from sklearn.cross_decomposition import PLSRegression
>>> from chemotools.outliers import Leverage
>>> X = np.random.rand(100, 10)
>>> y = np.random.rand(100)
>>> pls = PLSRegression(n_components=3).fit(X, y)
>>> # Initialize Leverage with the fitted PLS model
>>> leverage = Leverage(pls, confidence=0.95)
Leverage(model=PLSRegression(n_components=3), confidence=0.95)
>>> leverage.fit(X, y)
>>> # Predict outliers in the dataset
>>> outliers = leverage.predict(X)
>>> # Get the leverage of the samples
>>> residuals = leverage.predict_residuals(X)
fit(X: ndarray, y: ndarray | None = None) Leverage[source]

Fit the model to the input data.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data

  • y (array-like of shape (n_samples,), default=None) – Target data

Returns:

self – Fitted estimator with the critical threshold computed

Return type:

Leverage

predict(X: ndarray, y: ndarray | None = None) ndarray[source]

Calculate Leverage for training data on the model.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data

Returns:

Bool with samples with a leverage above the critical value

Return type:

ndarray of shape (n_samples,)

predict_residuals(X: ndarray, y: ndarray | None = None, validate: bool = True) ndarray[source]

Calculate the leverage of the samples.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data

Returns:

Leverage of the samples

Return type:

np.ndarray