HotellingT2#
- class chemotools.outliers.HotellingT2(model: _BasePCA | _PLS | Pipeline, confidence: float = 0.95)[source]
Bases:
_ModelResidualsBaseCalculate Hotelling’s T-squared statistics for PCA or PLS like models.
- Parameters:
model (Union[ModelType, Pipeline]) – A fitted PCA/PLS model or Pipeline ending with such a model
confidence (float, default=0.95) – Confidence level for statistical calculations (between 0 and 1)
- Variables:
estimator (ModelType) – The fitted model of type _BasePCA or _PLS
transformer (Optional[Pipeline]) – Preprocessing steps before the model
n_features_in (int) – Number of features in the input data
n_components (int) – Number of components in the model
n_samples (int) – Number of samples used to train the model
critical_value (float) – The calculated critical value for outlier detection
References
- [1] Johan A. Westerhuis, Stephen P. Gurden, Age K. Smilde
Generalized contribution plots in multivariate statistical process monitoring Chemometrics and Intelligent Laboratory Systems 51 2000 95–114 (2001).
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.outliers import HotellingT2 >>> from sklearn.decomposition import PCA >>> # Load sample data >>> X, _ = load_fermentation_train() >>> # Instantiate the PCA model >>> pca = PCA(n_components=3).fit(X) >>> # Initialize HotellingT2 with the fitted PCA model >>> hotelling_t2 = HotellingT2(model=pca, confidence=0.95) HotellingT2(model=PCA(n_components=3), confidence=0.95) >>> hotelling_t2.fit(X) >>> # Predict outliers in the dataset >>> outliers = hotelling_t2.predict(X) >>> # Calculate Hotelling's T-squared statistics >>> t2_stats = hotelling_t2.predict_residuals(X)
- fit(X: ndarray, y: ndarray | None = None) HotellingT2[source]
Fit the model to the input data.
This step calculates the critical value for the outlier detection.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
self – Fitted estimator with the critical threshold computed
- Return type:
HotellingT2
- predict(X: ndarray, y: ndarray | None = None) ndarray[source]
Identify outliers in the input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (None) – Ignored to align with API.
- Returns:
Boolean array indicating outliers (-1) and inliers (1)
- Return type:
ndarray of shape (n_samples,)
- predict_residuals(X: ndarray, y: ndarray | None = None, validate: bool = True) ndarray[source]
Calculate Hotelling’s T-squared statistics for input data.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input data
y (None) – Ignored.
- Returns:
Hotelling’s T-squared statistics for each sample
- Return type:
ndarray of shape (n_samples,)