BandScaler#
- class chemotools.scale.BandScaler(start: int = 0, end: int = -1, x_axis: ndarray | None = None, aggregation: str = 'mean', wavenumbers='deprecated')[source]
Bases:
XAxisMixin,TransformerMixin,OneToOneFeatureMixin,BaseEstimatorA transformer that scales the input data by the average intensity of a specified band. The band can be specified by an index range or by a range of wavenumbers.
- Parameters:
start (int, default=0) – Index or x-axis value of the start of the range.
end (int, default=-1) – Index or x-axis value of the end of the range.
x_axis (array-like, optional) – X-axis values corresponding to columns. Must be ascending if provided.
aggregation ({'mean', 'area'}, default='mean') – The aggregation method to use for calculating the band intensity. - ‘mean’: Calculate the mean intensity of the band. - ‘area’: Calculate the area under the band using the trapezoidal rule.
wavenumbers (array-like, optional) – Deprecated alias for
x_axis. Usex_axisinstead.
- Variables:
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.scale import BandScaler >>> # Load sample data >>> X, _ = load_fermentation_train() >>> # Initialize BandScaler with band indices >>> scaler = BandScaler(start=10, end=20) BandScaler(start=10, end=20) >>> # Fit and transform the data >>> X_scaled = scaler.fit_transform(X)
Notes
The choice between ‘mean’ and ‘area’ aggregation depends on whether the normalization should be based on average signal intensity or total integrated signal:
- Mean Scaling (‘mean’): Normalizes by the average intensity across the
band. This is standard for correcting global intensity fluctuations (e.g., source power drift or pathlength changes) while preserving the relative magnitude of the spectral profile.
- Area Scaling (‘area’): Normalizes by the numerical integral
(Trapezoidal rule) of the band. In many spectroscopic applications, the area under a curve is more representative of the total concentration or molar abundance than a single peak height or average intensity.
Importance of Coordinate-Aware Scaling: In some spectrometers, the sampling interval (distance between points on the x-axis) is not perfectly constant across the entire detector. - If the sampling is non-linear, a simple summation (equivalent to
assuming \(\Delta x=1\)) will mathematically over-weight regions where data points are more densely packed.
- By providing an x_axis, the ‘area’ method uses the actual distances
between points (\(\Delta x\)) to calculate a physically accurate integral.
When using
aggregation='area', anx_axismust be provided. If it is omitted, the transformer raises aValueErrorrather than implicitly assuming uniform sampling density across the selected band.See also
chemotools.scale.MinMaxScalerScales features to the Min-Max range.
chemotools.scale.NormScalerScales features to unit norm.
chemotools.scale.PointScalerScales features by the intensity at a specific point.
- fit(X: ndarray, y=None) BandScaler[source]
Fit the transformer to the input data.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to fit the transformer to.
y (None) – Ignored to align with API.
- Returns:
self – The fitted transformer.
- Return type:
BandScaler
- transform(X: ndarray, y=None) ndarray[source]
Transform the input data by scaling by the average intensity of the specified band.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to transform.
y (None) – Ignored to align with API.
- Returns:
X_transformed – The transformed data.
- Return type:
np.ndarray of shape (n_samples, n_features)