VIPSelector#

class chemotools.feature_selection.VIPSelector(model, threshold: float = 1.0)[source]

Bases: _PLSFeatureSelectorBase

This selector is used to select features that contribute significantly to the latent variables in a PLS regression model using the Variables Importance in Projection (VIP) method.

Parameters:
  • model (Union[_PLS, Pipeline]) – The PLS regression model or a pipeline with a PLS regression model as last step.

  • threshold (float, default=1.0) – The threshold for feature selection. Features with importance above this threshold will be selected.

Variables:
  • estimator (ModelTypes) – The fitted model of type _BasePCA or _PLS

  • feature_scores (np.ndarray) – The calculated feature scores based on the selected method.

  • support_mask (np.ndarray) – The boolean mask indicating which features are selected.

References

[1] Kim H. Esbensen,

“Multivariate Data Analysis - In Practice”, 5th Edition, 2002.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.feature_selection import VIPSelector
>>> from sklearn.cross_decomposition import PLSRegression
>>> # Load sample data
>>> X, y = load_fermentation_train()
>>> # Instantiate the PLS regression model
>>> pls_model = PLSRegression(n_components=2).fit(X, y)
>>> # Instantiate the VIP selector with the PLS model
>>> selector = VIPSelector(model=pls_model, threshold=1.0)
>>> selector.fit(X)
VIPSelector(model=PLSRegression(n_components=2), threshold=1.0)
>>> # Get the selected features
>>> X_selected = selector.transform(X)
>>> X_selected.shape
(21, 527)
fit(X: ndarray, y=None) VIPSelector[source]

Fit the transformer to calculate the feature scores and the support mask.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input data to fit the transformer to.

  • y (None) – Ignored to align with API.

Returns:

self – The fitted transformer.

Return type:

VIPSelector