BandScaler#

class chemotools.scale.BandScaler(start: int = 0, end: int = -1, x_axis: ndarray | None = None, aggregation: str = 'mean', wavenumbers='deprecated')[ソース]

ベースクラス: XAxisMixin, TransformerMixin, OneToOneFeatureMixin, BaseEstimator

A transformer that scales the input data by the average intensity of a specified band. The band can be specified by an index range or by a range of wavenumbers.

パラメータ:
  • start (int, default=0) -- Index or x-axis value of the start of the range.

  • end (int, default=-1) -- Index or x-axis value of the end of the range.

  • x_axis (array-like, optional) -- X-axis values corresponding to columns. Must be ascending if provided.

  • aggregation ({'mean', 'area'}, default='mean') -- The aggregation method to use for calculating the band intensity. - 'mean': Calculate the mean intensity of the band. - 'area': Calculate the area under the band using the trapezoidal rule.

  • wavenumbers (array-like, optional) -- Deprecated alias for x_axis. Use x_axis instead.

変数:
  • start_index (int) -- The index of the start of the band.

  • end_index (int) -- The index of the end of the band.

  • n_features_in (int) -- The number of features in the input data.

サンプル

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import BandScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> # Initialize BandScaler with band indices
>>> scaler = BandScaler(start=10, end=20)
BandScaler(start=10, end=20)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

メモ

The choice between 'mean' and 'area' aggregation depends on whether the normalization should be based on average signal intensity or total integrated signal:

  • Mean Scaling ('mean'): Normalizes by the average intensity across the

    band. This is standard for correcting global intensity fluctuations (e.g., source power drift or pathlength changes) while preserving the relative magnitude of the spectral profile.

  • Area Scaling ('area'): Normalizes by the numerical integral

    (Trapezoidal rule) of the band. In many spectroscopic applications, the area under a curve is more representative of the total concentration or molar abundance than a single peak height or average intensity.

Importance of Coordinate-Aware Scaling: In some spectrometers, the sampling interval (distance between points on the x-axis) is not perfectly constant across the entire detector. - If the sampling is non-linear, a simple summation (equivalent to

assuming \(\Delta x=1\)) will mathematically over-weight regions where data points are more densely packed.

  • By providing an x_axis, the 'area' method uses the actual distances

    between points (\(\Delta x\)) to calculate a physically accurate integral.

When using aggregation='area', an x_axis must be provided. If it is omitted, the transformer raises a ValueError rather than implicitly assuming uniform sampling density across the selected band.

参考

chemotools.scale.MinMaxScaler

Scales features to the Min-Max range.

chemotools.scale.NormScaler

Scales features to unit norm.

chemotools.scale.PointScaler

Scales features by the intensity at a specific point.

fit(X: ndarray, y=None) BandScaler[ソース]

Fit the transformer to the input data.

パラメータ:
  • X (np.ndarray of shape (n_samples, n_features)) -- The input data to fit the transformer to.

  • y (None) -- Ignored to align with API.

戻り値:

self -- The fitted transformer.

戻り値の型:

BandScaler

transform(X: ndarray, y=None) ndarray[ソース]

Transform the input data by scaling by the average intensity of the specified band.

パラメータ:
  • X (np.ndarray of shape (n_samples, n_features)) -- The input data to transform.

  • y (None) -- Ignored to align with API.

戻り値:

X_transformed -- The transformed data.

戻り値の型:

np.ndarray of shape (n_samples, n_features)