BandScaler#

class chemotools.scale.BandScaler(start: int = 0, end: int = -1, x_axis: ndarray | None = None, aggregation: str = 'mean', baseline_correction: bool = False, wavenumbers='deprecated')[source]

Bases: DocLinkMixin, XAxisMixin, TransformerMixin, OneToOneFeatureMixin, BaseEstimator

A transformer that scales the input data by the average intensity of a specified band. The band can be specified by an index range or by a range of wavenumbers.

Parameters:

start (int, default=0) – Index or x-axis value of the start of the range.
end (int, default=-1) – Index or x-axis value of the end of the range.
x_axis (array-like, optional) – X-axis values corresponding to columns. Must be ascending if provided.
aggregation ({'mean', 'area'}, default='mean') – The aggregation method to use for calculating the band intensity. - ‘mean’: Calculate the mean intensity of the band. - ‘area’: Calculate the area under the band using the trapezoidal rule.
baseline_correction (bool, default=False) – If True, a linear baseline connecting the band endpoints is subtracted from the band before computing the scaling factor. This removes the effect of a sloped baseline on the mean or area calculation.
wavenumbers (array-like, optional) – Deprecated alias for x_axis. Use x_axis instead.

Variables:

start_index (int) – The index of the start of the band.
end_index (int) – The index of the end of the band.
n_features_in (int) – The number of features in the input data.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import BandScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> # Initialize BandScaler with band indices
>>> scaler = BandScaler(start=10, end=20)
BandScaler(start=10, end=20)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

Notes

The choice between ‘mean’ and ‘area’ aggregation depends on whether the normalization should be based on average signal intensity or total integrated signal:

Mean Scaling (‘mean’): Normalizes by the average intensity across the
band. This is standard for correcting global intensity fluctuations (e.g., source power drift or pathlength changes) while preserving the relative magnitude of the spectral profile.
Area Scaling (‘area’): Normalizes by the numerical integral
(Trapezoidal rule) of the band. In many spectroscopic applications, the area under a curve is more representative of the total concentration or molar abundance than a single peak height or average intensity.

Importance of Coordinate-Aware Scaling: In some spectrometers, the sampling interval (distance between points on the x-axis) is not perfectly constant across the entire detector. - If the sampling is non-linear, a simple summation (equivalent to

assuming \(\Delta x=1\)) will mathematically over-weight regions where data points are more densely packed.

By providing an x_axis, the ‘area’ method uses the actual distances
between points (\(\Delta x\)) to calculate a physically accurate integral.

When using aggregation='area', an x_axis must be provided. If it is omitted, the transformer raises a ValueError rather than implicitly assuming uniform sampling density across the selected band.