BandScaler#

class chemotools.scale.BandScaler(start: int = 0, end: int = -1, x_axis: ndarray | None = None, aggregation: str = 'mean', wavenumbers='deprecated')[source]

Bases: XAxisMixin, TransformerMixin, OneToOneFeatureMixin, BaseEstimator

A transformer that scales the input data by the average intensity of a specified band. The band can be specified by an index range or by a range of wavenumbers.

Parameters:
  • start (int, default=0) – Index or x-axis value of the start of the range.

  • end (int, default=-1) – Index or x-axis value of the end of the range.

  • x_axis (array-like, optional) – X-axis values corresponding to columns. Must be ascending if provided.

  • aggregation ({'mean', 'area'}, default='mean') – The aggregation method to use for calculating the band intensity. - ‘mean’: Calculate the mean intensity of the band. - ‘area’: Calculate the area under the band using the trapezoidal rule.

  • wavenumbers (array-like, optional) – Deprecated alias for x_axis. Use x_axis instead.

Variables:
  • start_index (int) – The index of the start of the band.

  • end_index (int) – The index of the end of the band.

  • n_features_in (int) – The number of features in the input data.

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import BandScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> # Initialize BandScaler with band indices
>>> scaler = BandScaler(start=10, end=20)
BandScaler(start=10, end=20)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

Notes

The choice between ‘mean’ and ‘area’ aggregation depends on whether the normalization should be based on average signal intensity or total integrated signal:

  • Mean Scaling (‘mean’): Normalizes by the average intensity across the

    band. This is standard for correcting global intensity fluctuations (e.g., source power drift or pathlength changes) while preserving the relative magnitude of the spectral profile.

  • Area Scaling (‘area’): Normalizes by the numerical integral

    (Trapezoidal rule) of the band. In many spectroscopic applications, the area under a curve is more representative of the total concentration or molar abundance than a single peak height or average intensity.

Importance of Coordinate-Aware Scaling: In some spectrometers, the sampling interval (distance between points on the x-axis) is not perfectly constant across the entire detector. - If the sampling is non-linear, a simple summation (equivalent to

assuming \(\Delta x=1\)) will mathematically over-weight regions where data points are more densely packed.

  • By providing an x_axis, the ‘area’ method uses the actual distances

    between points (\(\Delta x\)) to calculate a physically accurate integral.

When using aggregation='area', an x_axis must be provided. If it is omitted, the transformer raises a ValueError rather than implicitly assuming uniform sampling density across the selected band.

See also

chemotools.scale.MinMaxScaler

Scales features to the Min-Max range.

chemotools.scale.NormScaler

Scales features to unit norm.

chemotools.scale.PointScaler

Scales features by the intensity at a specific point.

fit(X: ndarray, y=None) BandScaler[source]

Fit the transformer to the input data.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to fit the transformer to.

  • y (None) – Ignored to align with API.

Returns:

self – The fitted transformer.

Return type:

BandScaler

transform(X: ndarray, y=None) ndarray[source]

Transform the input data by scaling by the average intensity of the specified band.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – The transformed data.

Return type:

np.ndarray of shape (n_samples, n_features)