WhittakerSmooth#

class chemotools.smooth.WhittakerSmooth(lam: float = 10000.0, weights: ndarray | None = None, solver_type: Literal['banded', 'sparse'] = 'banded', n_jobs: int = 1)[source]

Bases: _BaseWhittaker

Whittaker smoothing for noise reduction and signal trend estimation.

Whittaker smoothing is a penalized least squares method that estimates smooth trends from noisy data by balancing fidelity to the input signal with a smoothness constraint. A second-order difference operator is used as the penalty term, ensuring that the estimated signal is smooth while preserving overall shape.

The Whittaker smoothing step can be solved using either: - a banded solver (fast and memory-efficient, recommended for most spectra), or - a sparse LU solver (more stable for ill-conditioned problems).

Optional weights can be provided to emphasize or downweight certain observations during smoothing. If no weights are supplied, all points are treated equally.

Parameters:
  • lam (float, default=1e4) – Regularization parameter controlling smoothness of the fitted signal. Larger values yield smoother trends.

  • weights (ndarray of shape (n_features,), optional, default=None) – Non-negative weights applied to each observation. If None, all observations are weighted equally.

  • solver_type (Literal["banded", "sparse"], default="banded") – Backend used to solve the Whittaker linear system. Prefer "banded" (the default): it solves all rows in a single batched LAPACK call and is roughly 27× faster than "sparse" at scale. Use "sparse" only as a numerical fallback for ill-conditioned problems.

  • n_jobs (int, default=1) – Number of parallel jobs used during transform(). Only effective when solver_type="sparse": rows are split across workers via joblib. When solver_type="banded" (the default), a single vectorised LAPACK batch solve is used regardless of this value, because it is already faster than spawning parallel workers. With solver_type="sparse" and n_jobs=-1, benchmarks show roughly 4× speedup on 8 cores.

Variables:

n_features_in (int) – The number of features in the training data.

References

[1] Eilers, P.H. (2003).

“A perfect smoother.” Analytical Chemistry 75 (14), 3631–3636.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.smooth import WhittakerSmooth
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> # Initialize WhittakerSmooth
>>> ws = WhittakerSmooth()
WhittakerSmooth()
>>> # Fit and transform the data
>>> X_smoothed = ws.fit_transform(X)
fit(X: ndarray, y=None) WhittakerSmooth[source]

Fit the Whittaker smoother to input data.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – The input data matrix, where rows correspond to samples and columns correspond to features (e.g., spectra).

  • y (None) – Ignored, present for API consistency with scikit-learn.

Returns:

self – Fitted estimator.

Return type:

WhittakerSmooth

transform(X: ndarray, y=None) ndarray[source]

Apply Whittaker smoothing to input data.

Parameters:
  • X (ndarray of shape (n_samples, n_features)) – The input data matrix to smooth.

  • y (None) – Ignored, present for API consistency with scikit-learn.

Returns:

X_transformed – The smoothed version of the input data.

Return type:

ndarray of shape (n_samples, n_features)