PLSRegressionInspector#
- class chemotools.inspector.PLSRegressionInspector(model: _PLS | Pipeline, X_train: ndarray, y_train: ndarray, X_test: ndarray | None = None, y_test: ndarray | None = None, X_val: ndarray | None = None, y_val: ndarray | None = None, x_axis: Sequence | None = None, confidence: float = 0.95)[source]
Bases:
SpectraMixin,RegressionMixin,LatentVariableMixin,_BaseInspectorInspector for PLS Regression model diagnostics and visualization.
This class provides a unified interface for inspecting PLS regression models by creating multiple independent diagnostic plots. Instead of complex dashboards with many subplots, each method produces several separate figure windows that are easier to customize, save, and interact with individually.
The inspector provides convenience methods that create multiple independent plots:
inspect(): Creates all diagnostic plots (scores, loadings, explained variance, regression diagnostics, and distance plots)inspect_spectra(): Creates raw and preprocessed spectra plots (if preprocessing exists)
- Parameters:
model (_PLS or Pipeline) – Fitted PLS model or pipeline ending with PLS
X_train (array-like of shape (n_samples, n_features)) – Training data
y_train (array-like of shape (n_samples,)) – Training targets (required for supervised PLS)
X_test (array-like of shape (n_samples, n_features), optional) – Test data
y_test (array-like of shape (n_samples,), optional) – Test targets
X_val (array-like of shape (n_samples, n_features), optional) – Validation data
y_val (array-like of shape (n_samples,), optional) – Validation targets
x_axis (array-like of shape (n_features,), optional) – Feature names (e.g., wavenumbers for spectroscopy) If None, uses feature indices
confidence (float, default=0.95) – Confidence level for outlier detection limits (Hotelling’s T², Q residuals, leverage, and studentized residuals). Must be between 0 and 1.
- Variables:
model (_PLS or Pipeline) – The original model passed to the inspector
estimator (_PLS) – The PLS estimator
transformer (Pipeline or None) – Preprocessing pipeline before PLS (if model was a Pipeline)
n_components (int) – Number of latent variables
n_features (int) – Number of features in original data
n_samples (dict) – Number of samples in each dataset
x_axis (ndarray) – Feature names/indices
confidence (float) – Confidence level for outlier detection
RMSE_train (float) – Root mean squared error on training data
RMSE_test (float or None) – Root mean squared error on test data (if available)
RMSE_val (float or None) – Root mean squared error on validation data (if available)
R2_train (float) – R² score on training data
R2_test (float or None) – R² score on test data (if available)
R2_val (float or None) – R² score on validation data (if available)
hotelling_t2_limit (float) – Critical value for Hotelling’s T² statistic (computed on training data)
q_residuals_limit (float) – Critical value for Q residuals statistic (computed on training data)
Examples
>>> from sklearn.cross_decomposition import PLSRegression >>> from sklearn.pipeline import make_pipeline >>> from sklearn.preprocessing import StandardScaler >>> from chemotools.datasets import load_fermentation_train >>> from chemotools.inspector import PLSRegressionInspector >>> >>> # Load data >>> X, y = load_fermentation_train() >>> >>> # Create and fit pipeline >>> pipeline = make_pipeline( ... StandardScaler(), ... PLSRegression(n_components=5) ... ) >>> pipeline.fit(X, y) >>> >>> # Create inspector >>> inspector = PLSRegressionInspector(pipeline, X, y, x_axis=X.columns) >>> >>> # Print summary >>> inspector.summary() >>> >>> # Create all diagnostic plots >>> inspector.inspect() # Creates scores, loadings, variance, regression plots >>> >>> # Compare preprocessing >>> inspector.inspect_spectra() >>> >>> # Access underlying data for custom analysis >>> x_scores = inspector.get_x_scores('train') >>> y_scores = inspector.get_y_scores('train') >>> x_loadings = inspector.get_x_loadings([0, 1, 2]) >>> coeffs = inspector.get_regression_coefficients()
Notes
Memory usage scales linearly with dataset size. For very large datasets (>100,000 samples), consider subsampling for initial exploration.
Attributes
R2_testReturn R² score on test data, or
Nonewhen unavailable.R2_trainReturn R² score on training data.
R2_valReturn R² score on validation data, or
Nonewhen unavailable.RMSE_testReturn RMSE on test data, or
Nonewhen unavailable.RMSE_trainReturn RMSE on training data.
RMSE_valReturn RMSE on validation data, or
Nonewhen unavailable.component_labelconfidenceReturn the confidence level for outlier detection.
estimatorReturn the underlying estimator (PCA or PLS).
hotelling_t2_limitReturn the Hotelling's T² critical value at the specified confidence level.
leverage_detectorReturn a fitted leverage detector cached for reuse.
modelReturn the original model.
n_componentsReturn the number of latent variables/components.
n_featuresReturn the number of features in original data.
n_samplesReturn the number of samples in each dataset.
q_residuals_limitReturn the Q residuals critical value at the specified confidence level.
studentized_detectorReturn a fitted studentized residuals detector cached for reuse.
transformerReturn the preprocessing transformer (if any).
x_axisReturn the feature names/indices.
- component_label: str = 'LV'
- property leverage_detector: Leverage
Return a fitted leverage detector cached for reuse.
- property studentized_detector: StudentizedResiduals
Return a fitted studentized residuals detector cached for reuse.
- get_latent_explained_variance() ndarray | None[source]
Hook for LatentVariableMixin - returns explained X variance ratio.
- get_x_scores(dataset: str = 'train') ndarray[source]
Get PLS X-scores for specified dataset.
- Parameters:
dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for
- Returns:
x_scores – PLS X-scores (latent variables from X)
- Return type:
ndarray of shape (n_samples, n_components)
- get_y_scores(dataset: str = 'train') ndarray[source]
Get PLS Y-scores for specified dataset.
- Parameters:
dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for
- Returns:
y_scores – PLS Y-scores (latent variables from Y)
- Return type:
ndarray of shape (n_samples, n_components)
- get_regression_coefficients() ndarray[source]
Get PLS regression coefficients (regression vector).
- Returns:
coef – PLS regression coefficients
- Return type:
ndarray of shape (n_features,) or (n_features, n_targets)
- get_explained_x_variance_ratio() ndarray | None[source]
Get explained variance ratio in X-space for all components.
- Returns:
explained_x_variance_ratio – Explained variance ratio in X-space, or None if not available
- Return type:
ndarray of shape (n_components,) or None
- get_explained_y_variance_ratio() ndarray | None[source]
Get explained variance ratio in Y-space for all components.
- Returns:
explained_y_variance_ratio – Explained variance ratio in Y-space, or None if not available
- Return type:
ndarray of shape (n_components,) or None
- summary() PLSRegressionSummary[source]
Get a summary of the PLS regression model.
- Returns:
summary – Object containing model information
- Return type:
PLSRegressionSummary
- inspect(dataset: str | Sequence[str] = 'train', components_scores: int | Tuple[int, int] | Sequence[int | Tuple[int, int]] | None = None, loadings_components: int | Sequence[int] | None = None, variance_threshold: float = 0.95, color_by: str | Dict[str, np.ndarray] | Sequence | np.ndarray | None = None, annotate_by: str | Dict[str, np.ndarray] | Sequence | np.ndarray | None = None, plot_config: InspectorPlotConfig | None = None, color_mode: Literal['continuous', 'categorical'] = 'continuous', target_index: int = 0, **kwargs) Dict[str, matplotlib.figure.Figure][source]
Create all diagnostic plots for the PLS model.
- Parameters:
dataset (str or sequence of str, default='train') – Dataset(s) to visualize. Can be ‘train’, ‘test’, ‘val’, or a list.
components_scores (int, tuple, or sequence, optional) –
Components to plot for scores.
If int: plots first N components against sample index
If tuple (i, j): plots component i vs j
If sequence: plots multiple specifications
If None: defaults to (0, 1) and (1, 2) if enough components exist
loadings_components (int or sequence of int, optional) –
Components to plot for loadings.
If int: plots first N components
If sequence: plots specified components
If None: defaults to first 3 components
variance_threshold (float, default=0.95) – Cumulative variance threshold for variance plots
color_by (str or dict, optional) –
Coloring specification.
”y”: Color by target values (default for single dataset)
”sample_index”: Color by sample index
dict: Dictionary mapping dataset names to color arrays
None: Color by dataset (for multi-dataset plots) or ‘y’ (for single dataset)
annotate_by (str or dict, optional) –
Annotations for plot points.
”sample_index”: Annotate with sample indices
dict: Dictionary mapping dataset names to annotation arrays
plot_config (InspectorPlotConfig, optional) – Configuration for plot sizes and styles
color_mode (str, optional) – Coloring mode (“continuous” or “categorical”).
target_index (int, default=0) – Index of the target variable to inspect (for multi-output PLS).
**kwargs – Additional arguments passed to InspectorPlotConfig
- Returns:
Dictionary of matplotlib Figures with keys:
’scores_1’, ‘scores_2’, …: Scores plots
’x_vs_y_scores_1’, ‘x_vs_y_scores_2’, …: X-scores vs Y-scores plots (training set only)
’loadings_x’, ‘loadings_weights’, ‘loadings_rotations’: X-related loadings plots
’regression_coefficients’: Regression coefficient traces (one per target when multi-output)
’variance_x’, ‘variance_y’: Explained variance plots (when available)
’distances_hotelling_q’, ‘distances_q_y_residuals’, ‘distances_leverage_studentized’: Distance diagnostics
’predicted_vs_actual’, ‘residuals’, ‘qq_plot’, ‘residual_distribution’: Regression diagnostics
’raw_spectra’, ‘preprocessed_spectra’: Spectra plots (when preprocessing exists)
- Return type: