PLSRegressionInspector#

class chemotools.inspector.PLSRegressionInspector(model: _PLS | Pipeline, X_train: ndarray, y_train: ndarray, X_test: ndarray | None = None, y_test: ndarray | None = None, X_val: ndarray | None = None, y_val: ndarray | None = None, x_axis: Sequence | None = None, confidence: float = 0.95)[source]

Bases: SpectraMixin, RegressionMixin, LatentVariableMixin, _BaseInspector

Inspector for PLS Regression model diagnostics and visualization.

This class provides a unified interface for inspecting PLS regression models by creating multiple independent diagnostic plots. Instead of complex dashboards with many subplots, each method produces several separate figure windows that are easier to customize, save, and interact with individually.

The inspector provides convenience methods that create multiple independent plots:

inspect(): Creates all diagnostic plots (scores, loadings, explained variance, regression diagnostics, and distance plots)
inspect_spectra(): Creates raw and preprocessed spectra plots (if preprocessing exists)

Parameters:

model (_PLS or Pipeline) – Fitted PLS model or pipeline ending with PLS
X_train (array-like of shape (n_samples, n_features)) – Training data
y_train (array-like of shape (n_samples,)) – Training targets (required for supervised PLS)
X_test (array-like of shape (n_samples, n_features), optional) – Test data
y_test (array-like of shape (n_samples,), optional) – Test targets
X_val (array-like of shape (n_samples, n_features), optional) – Validation data
y_val (array-like of shape (n_samples,), optional) – Validation targets
x_axis (array-like of shape (n_features,), optional) – Feature names (e.g., wavenumbers for spectroscopy) If None, uses feature indices
confidence (float, default=0.95) – Confidence level for outlier detection limits (Hotelling’s T², Q residuals, leverage, and studentized residuals). Must be between 0 and 1.

Variables:

model (_PLS or Pipeline) – The original model passed to the inspector
estimator (_PLS) – The PLS estimator
transformer (Pipeline or None) – Preprocessing pipeline before PLS (if model was a Pipeline)
n_components (int) – Number of latent variables
n_features (int) – Number of features in original data
n_samples (dict) – Number of samples in each dataset
x_axis (ndarray) – Feature names/indices
confidence (float) – Confidence level for outlier detection
RMSE_train (float) – Root mean squared error on training data
RMSE_test (float or None) – Root mean squared error on test data (if available)
RMSE_val (float or None) – Root mean squared error on validation data (if available)
R2_train (float) – R² score on training data
R2_test (float or None) – R² score on test data (if available)
R2_val (float or None) – R² score on validation data (if available)
hotelling_t2_limit (float) – Critical value for Hotelling’s T² statistic (computed on training data)
q_residuals_limit (float) – Critical value for Q residuals statistic (computed on training data)

Examples

>>> from sklearn.cross_decomposition import PLSRegression
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.inspector import PLSRegressionInspector
>>>
>>> # Load data
>>> X, y = load_fermentation_train()
>>>
>>> # Create and fit pipeline
>>> pipeline = make_pipeline(
...     StandardScaler(),
...     PLSRegression(n_components=5)
... )
>>> pipeline.fit(X, y)
>>>
>>> # Create inspector
>>> inspector = PLSRegressionInspector(pipeline, X, y, x_axis=X.columns)
>>>
>>> # Print summary
>>> inspector.summary()
>>>
>>> # Create all diagnostic plots
>>> inspector.inspect()  # Creates scores, loadings, variance, regression plots
>>>
>>> # Compare preprocessing
>>> inspector.inspect_spectra()
>>>
>>> # Access underlying data for custom analysis
>>> x_scores = inspector.get_x_scores('train')
>>> y_scores = inspector.get_y_scores('train')
>>> x_loadings = inspector.get_x_loadings([0, 1, 2])
>>> coeffs = inspector.get_regression_coefficients()

Notes

Memory usage scales linearly with dataset size. For very large datasets (>100,000 samples), consider subsampling for initial exploration.

Attributes

`R2_test`	Return R² score on test data, or `None` when unavailable.
`R2_train`	Return R² score on training data.
`R2_val`	Return R² score on validation data, or `None` when unavailable.
`RMSE_test`	Return RMSE on test data, or `None` when unavailable.
`RMSE_train`	Return RMSE on training data.
`RMSE_val`	Return RMSE on validation data, or `None` when unavailable.
`component_label`
`confidence`	Return the confidence level for outlier detection.
`estimator`	Return the underlying estimator (PCA or PLS).
`hotelling_t2_limit`	Return the Hotelling's T² critical value at the specified confidence level.
`leverage_detector`	Return a fitted leverage detector cached for reuse.
`model`	Return the original model.
`n_components`	Return the number of latent variables/components.
`n_features`	Return the number of features in original data.
`n_samples`	Return the number of samples in each dataset.
`q_residuals_limit`	Return the Q residuals critical value at the specified confidence level.
`studentized_detector`	Return a fitted studentized residuals detector cached for reuse.
`transformer`	Return the preprocessing transformer (if any).
`x_axis`	Return the feature names/indices.

component_label: str = 'LV'

property leverage_detector: Leverage: Return a fitted leverage detector cached for reuse.

property studentized_detector: StudentizedResiduals: Return a fitted studentized residuals detector cached for reuse.

get_latent_scores(dataset: str) → ndarray[source]: Hook for LatentVariableMixin - returns X-scores.

get_latent_explained_variance() → ndarray | None[source]: Hook for LatentVariableMixin - returns explained X variance ratio.

get_latent_loadings() → ndarray[source]: Hook for LatentVariableMixin - returns X-loadings.

get_x_scores(dataset: str = 'train') → ndarray[source]

Get PLS X-scores for specified dataset.

Parameters:: dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for
Returns:: x_scores – PLS X-scores (latent variables from X)
Return type:: ndarray of shape (n_samples, n_components)

get_y_scores(dataset: str = 'train') → ndarray[source]

Get PLS Y-scores for specified dataset.

Parameters:: dataset ({'train', 'test', 'val'}, default='train') – Which dataset to get scores for
Returns:: y_scores – PLS Y-scores (latent variables from Y)
Return type:: ndarray of shape (n_samples, n_components)

get_x_loadings(components: int | Sequence[int] | None = None) → ndarray[source]

Get PLS X-loadings.

Parameters:: components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.
Returns:: x_loadings – PLS X-loadings
Return type:: ndarray of shape (n_features, n_components_selected)

get_x_weights(components: int | Sequence[int] | None = None) → ndarray[source]

Get PLS X-weights.

Parameters:: components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.
Returns:: x_weights – PLS X-weights
Return type:: ndarray of shape (n_features, n_components_selected)

get_x_rotations(components: int | Sequence[int] | None = None) → ndarray[source]

Get PLS X-rotations.

Parameters:: components (int, list of int, or None, default=None) – Which components to return. If None, returns all components.
Returns:: x_rotations – PLS X-rotations
Return type:: ndarray of shape (n_features, n_components_selected)

get_regression_coefficients() → ndarray[source]

Get PLS regression coefficients (regression vector).

Returns:: coef – PLS regression coefficients
Return type:: ndarray of shape (n_features,) or (n_features, n_targets)

get_explained_x_variance_ratio() → ndarray | None[source]

Get explained variance ratio in X-space for all components.

Returns:: explained_x_variance_ratio – Explained variance ratio in X-space, or None if not available
Return type:: ndarray of shape (n_components,) or None

get_explained_y_variance_ratio() → ndarray | None[source]

Get explained variance ratio in Y-space for all components.

Returns:: explained_y_variance_ratio – Explained variance ratio in Y-space, or None if not available
Return type:: ndarray of shape (n_components,) or None

summary() → PLSRegressionSummary[source]

Get a summary of the PLS regression model.

Returns:: summary – Object containing model information
Return type:: PLSRegressionSummary

Create all diagnostic plots for the PLS model.

Parameters:

dataset (str or sequence of str, default='train') – Dataset(s) to visualize. Can be ‘train’, ‘test’, ‘val’, or a list.
components_scores (int, tuple, or sequence, optional) –
Components to plot for scores.
- If int: plots first N components against sample index
- If tuple (i, j): plots component i vs j
- If sequence: plots multiple specifications
- If None: defaults to (0, 1) and (1, 2) if enough components exist
loadings_components (int or sequence of int, optional) –
Components to plot for loadings.
- If int: plots first N components
- If sequence: plots specified components
- If None: defaults to first 3 components
variance_threshold (float, default=0.95) – Cumulative variance threshold for variance plots
color_by (str or dict, optional) –
Coloring specification.
- ”y”: Color by target values (default for single dataset)
- ”sample_index”: Color by sample index
- dict: Dictionary mapping dataset names to color arrays
- None: Color by dataset (for multi-dataset plots) or ‘y’ (for single dataset)
annotate_by (str or dict, optional) –
Annotations for plot points.
- ”sample_index”: Annotate with sample indices
- dict: Dictionary mapping dataset names to annotation arrays
plot_config (InspectorPlotConfig, optional) – Configuration for plot sizes and styles
color_mode (str, optional) – Coloring mode (“continuous” or “categorical”).
target_index (int, default=0) – Index of the target variable to inspect (for multi-output PLS).
**kwargs – Additional arguments passed to InspectorPlotConfig

Returns:

Dictionary of matplotlib Figures with keys:

’scores_1’, ‘scores_2’, …: Scores plots
’x_vs_y_scores_1’, ‘x_vs_y_scores_2’, …: X-scores vs Y-scores plots (training set only)
’loadings_x’, ‘loadings_weights’, ‘loadings_rotations’: X-related loadings plots
’regression_coefficients’: Regression coefficient traces (one per target when multi-output)
’variance_x’, ‘variance_y’: Explained variance plots (when available)
’distances_hotelling_q’, ‘distances_q_y_residuals’, ‘distances_leverage_studentized’: Distance diagnostics
’predicted_vs_actual’, ‘residuals’, ‘qq_plot’, ‘residual_distribution’: Regression diagnostics
’raw_spectra’, ‘preprocessed_spectra’: Spectra plots (when preprocessing exists)

Return type:

dict

PLSRegressionInspector#

This Page