PLSRegressionInspector#

class chemotools.inspector.PLSRegressionInspector(model: _PLS | Pipeline, X_train: ndarray, y_train: ndarray, X_test: ndarray | None = None, y_test: ndarray | None = None, X_val: ndarray | None = None, y_val: ndarray | None = None, x_axis: Sequence | None = None, confidence: float = 0.95)[source]

Bases: SpectraMixin, RegressionMixin, LatentVariableMixin, _BaseInspector

Inspector for PLS Regression model diagnostics and visualization.

This class provides a unified interface for inspecting PLS regression models by creating multiple independent diagnostic plots. Instead of complex dashboards with many subplots, each method produces several separate figure windows that are easier to customize, save, and interact with individually.

The inspector provides convenience methods that create multiple independent plots:

  • inspect(): Creates all diagnostic plots (scores, loadings, explained variance, regression diagnostics, and distance plots)

  • inspect_spectra(): Creates raw and preprocessed spectra plots (if preprocessing exists)

Parameters:
  • model (_PLS or Pipeline) – Fitted PLS model or pipeline ending with PLS

  • X_train (array-like of shape (n_samples, n_features)) – Training data

  • y_train (array-like of shape (n_samples,)) – Training targets (required for supervised PLS)

  • X_test (array-like of shape (n_samples, n_features), optional) – Test data

  • y_test (array-like of shape (n_samples,), optional) – Test targets

  • X_val (array-like of shape (n_samples, n_features), optional) – Validation data

  • y_val (array-like of shape (n_samples,), optional) – Validation targets

  • x_axis (array-like of shape (n_features,), optional) – Feature names (e.g., wavenumbers for spectroscopy) If None, uses feature indices

  • confidence (float, default=0.95) – Confidence level for outlier detection limits (Hotelling’s T², Q residuals, leverage, and studentized residuals). Must be between 0 and 1.

Variables:
  • model (_PLS or Pipeline) – The original model passed to the inspector

  • estimator (_PLS) – The PLS estimator

  • transformer (Pipeline or None) – Preprocessing pipeline before PLS (if model was a Pipeline)

  • n_components (int) – Number of latent variables

  • n_features (int) – Number of features in original data

  • n_samples (dict) – Number of samples in each dataset

  • x_axis (ndarray) – Feature names/indices

  • confidence (float) – Confidence level for outlier detection

  • RMSE_train (float) – Root mean squared error on training data

  • RMSE_test (float or None) – Root mean squared error on test data (if available)

  • RMSE_val (float or None) – Root mean squared error on validation data (if available)

  • R2_train (float) – R² score on training data

  • R2_test (float or None) – R² score on test data (if available)

  • R2_val (float or None) – R² score on validation data (if available)

  • hotelling_t2_limit (float) – Critical value for Hotelling’s T² statistic (computed on training data)

  • q_residuals_limit (float) – Critical value for Q residuals statistic (computed on training data)

Examples

>>> from sklearn.cross_decomposition import PLSRegression
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.inspector import PLSRegressionInspector
>>>
>>> # Load data
>>> X, y = load_fermentation_train()
>>>
>>> # Create and fit pipeline
>>> pipeline = make_pipeline(
...     StandardScaler(),
...     PLSRegression(n_components=5)
... )
>>> pipeline.fit(X, y)
>>>
>>> # Create inspector
>>> inspector = PLSRegressionInspector(pipeline, X, y, x_axis=X.columns)
>>>
>>> # Print summary
>>> inspector.summary()
>>>
>>> # Create all diagnostic plots
>>> inspector.inspect()  # Creates scores, loadings, variance, regression plots
>>>
>>> # Compare preprocessing
>>> inspector.inspect_spectra()
>>>
>>> # Access underlying data for custom analysis
>>> x_scores = inspector.get_x_scores('train')
>>> y_scores = inspector.get_y_scores('train')
>>> x_loadings = inspector.get_x_loadings([0, 1, 2])
>>> coeffs = inspector.get_regression_coefficients()

Notes

Memory usage scales linearly with dataset size. For very large datasets (>100,000 samples), consider subsampling for initial exploration.

Attributes

R2_test

Return R² score on test data, or None when unavailable.

R2_train

Return R² score on training data.

R2_val

Return R² score on validation data, or None when unavailable.

RMSE_test

Return RMSE on test data, or None when unavailable.

RMSE_train

Return RMSE on training data.

RMSE_val

Return RMSE on validation data, or None when unavailable.

component_label

confidence

Return the confidence level for outlier detection.

estimator

Return the underlying estimator (PCA or PLS).

hotelling_t2_limit

Return the Hotelling's T² critical value at the specified confidence level.

leverage_detector

Return a fitted leverage detector cached for reuse.

model

Return the original model.

n_components

Return the number of latent variables/components.

n_features

Return the number of features in original data.

n_samples

Return the number of samples in each dataset.

q_residuals_limit

Return the Q residuals critical value at the specified confidence level.

studentized_detector

Return a fitted studentized residuals detector cached for reuse.

transformer

Return the preprocessing transformer (if any).

x_axis

Return the feature names/indices.

component_label: str = 'LV'
property leverage_detector: Leverage

Return a fitted leverage detector cached for reuse.

property studentized_detector: StudentizedResiduals

Return a fitted studentized residuals detector cached for reuse.

get_latent_scores(dataset: str) ndarray[source]

Hook for LatentVariableMixin - returns X-scores.

get_latent_explained_variance() ndarray | None[source]

Hook for LatentVariableMixin - returns explained X variance ratio.

get_latent_loadings() ndarray[source]

Hook for LatentVariableMixin - returns X-loadings.

get_x_scores(dataset: str = 'train') ndarray[source]

Get PLS X-scores for specified dataset.

Parameters:

dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for

Returns:

x_scores – PLS X-scores (latent variables from X)

Return type:

ndarray of shape (n_samples, n_components)

get_y_scores(dataset: str = 'train') ndarray[source]

Get PLS Y-scores for specified dataset.

Parameters:

dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for

Returns:

y_scores – PLS Y-scores (latent variables from Y)

Return type:

ndarray of shape (n_samples, n_components)

get_x_loadings(components: int | Sequence[int] | None = None) ndarray[source]

Get PLS X-loadings.

Parameters:

components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.

Returns:

x_loadings – PLS X-loadings

Return type:

ndarray of shape (n_features, n_components_selected)

get_x_weights(components: int | Sequence[int] | None = None) ndarray[source]

Get PLS X-weights.

Parameters:

components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.

Returns:

x_weights – PLS X-weights

Return type:

ndarray of shape (n_features, n_components_selected)

get_x_rotations(components: int | Sequence[int] | None = None) ndarray[source]

Get PLS X-rotations.

Parameters:

components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.

Returns:

x_rotations – PLS X-rotations

Return type:

ndarray of shape (n_features, n_components_selected)

get_regression_coefficients() ndarray[source]

Get PLS regression coefficients (regression vector).

Returns:

coef – PLS regression coefficients

Return type:

ndarray of shape (n_features,) or (n_features, n_targets)

get_explained_x_variance_ratio() ndarray | None[source]

Get explained variance ratio in X-space for all components.

Returns:

explained_x_variance_ratio – Explained variance ratio in X-space, or None if not available

Return type:

ndarray of shape (n_components,) or None

get_explained_y_variance_ratio() ndarray | None[source]

Get explained variance ratio in Y-space for all components.

Returns:

explained_y_variance_ratio – Explained variance ratio in Y-space, or None if not available

Return type:

ndarray of shape (n_components,) or None

summary() PLSRegressionSummary[source]

Get a summary of the PLS regression model.

Returns:

summary – Object containing model information

Return type:

PLSRegressionSummary

inspect(dataset: str | Sequence[str] = 'train', components_scores: int | Tuple[int, int] | Sequence[int | Tuple[int, int]] | None = None, loadings_components: int | Sequence[int] | None = None, variance_threshold: float = 0.95, color_by: str | Dict[str, np.ndarray] | Sequence | np.ndarray | None = None, annotate_by: str | Dict[str, np.ndarray] | Sequence | np.ndarray | None = None, plot_config: InspectorPlotConfig | None = None, color_mode: Literal['continuous', 'categorical'] = 'continuous', target_index: int = 0, **kwargs) Dict[str, matplotlib.figure.Figure][source]

Create all diagnostic plots for the PLS model.

Parameters:
  • dataset (str or sequence of str, default='train') – Dataset(s) to visualize. Can be ‘train’, ‘test’, ‘val’, or a list.

  • components_scores (int, tuple, or sequence, optional) –

    Components to plot for scores.

    • If int: plots first N components against sample index

    • If tuple (i, j): plots component i vs j

    • If sequence: plots multiple specifications

    • If None: defaults to (0, 1) and (1, 2) if enough components exist

  • loadings_components (int or sequence of int, optional) –

    Components to plot for loadings.

    • If int: plots first N components

    • If sequence: plots specified components

    • If None: defaults to first 3 components

  • variance_threshold (float, default=0.95) – Cumulative variance threshold for variance plots

  • color_by (str or dict, optional) –

    Coloring specification.

    • ”y”: Color by target values (default for single dataset)

    • ”sample_index”: Color by sample index

    • dict: Dictionary mapping dataset names to color arrays

    • None: Color by dataset (for multi-dataset plots) or ‘y’ (for single dataset)

  • annotate_by (str or dict, optional) –

    Annotations for plot points.

    • ”sample_index”: Annotate with sample indices

    • dict: Dictionary mapping dataset names to annotation arrays

  • plot_config (InspectorPlotConfig, optional) – Configuration for plot sizes and styles

  • color_mode (str, optional) – Coloring mode (“continuous” or “categorical”).

  • target_index (int, default=0) – Index of the target variable to inspect (for multi-output PLS).

  • **kwargs – Additional arguments passed to InspectorPlotConfig

Returns:

Dictionary of matplotlib Figures with keys:

  • ’scores_1’, ‘scores_2’, …: Scores plots

  • ’x_vs_y_scores_1’, ‘x_vs_y_scores_2’, …: X-scores vs Y-scores plots (training set only)

  • ’loadings_x’, ‘loadings_weights’, ‘loadings_rotations’: X-related loadings plots

  • ’regression_coefficients’: Regression coefficient traces (one per target when multi-output)

  • ’variance_x’, ‘variance_y’: Explained variance plots (when available)

  • ’distances_hotelling_q’, ‘distances_q_y_residuals’, ‘distances_leverage_studentized’: Distance diagnostics

  • ’predicted_vs_actual’, ‘residuals’, ‘qq_plot’, ‘residual_distribution’: Regression diagnostics

  • ’raw_spectra’, ‘preprocessed_spectra’: Spectra plots (when preprocessing exists)

Return type:

dict