BandScaler#
- class chemotools.scale.BandScaler(start: int = 0, end: int = -1, x_axis: ndarray | None = None, aggregation: str = 'mean', wavenumbers='deprecated')[ソース]
ベースクラス:
XAxisMixin,TransformerMixin,OneToOneFeatureMixin,BaseEstimatorA transformer that scales the input data by the average intensity of a specified band. The band can be specified by an index range or by a range of wavenumbers.
- パラメータ:
start (int, default=0) -- Index or x-axis value of the start of the range.
end (int, default=-1) -- Index or x-axis value of the end of the range.
x_axis (array-like, optional) -- X-axis values corresponding to columns. Must be ascending if provided.
aggregation ({'mean', 'area'}, default='mean') -- The aggregation method to use for calculating the band intensity. - 'mean': Calculate the mean intensity of the band. - 'area': Calculate the area under the band using the trapezoidal rule.
wavenumbers (array-like, optional) -- Deprecated alias for
x_axis. Usex_axisinstead.
- 変数:
サンプル
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.scale import BandScaler >>> # Load sample data >>> X, _ = load_fermentation_train() >>> # Initialize BandScaler with band indices >>> scaler = BandScaler(start=10, end=20) BandScaler(start=10, end=20) >>> # Fit and transform the data >>> X_scaled = scaler.fit_transform(X)
メモ
The choice between 'mean' and 'area' aggregation depends on whether the normalization should be based on average signal intensity or total integrated signal:
- Mean Scaling ('mean'): Normalizes by the average intensity across the
band. This is standard for correcting global intensity fluctuations (e.g., source power drift or pathlength changes) while preserving the relative magnitude of the spectral profile.
- Area Scaling ('area'): Normalizes by the numerical integral
(Trapezoidal rule) of the band. In many spectroscopic applications, the area under a curve is more representative of the total concentration or molar abundance than a single peak height or average intensity.
Importance of Coordinate-Aware Scaling: In some spectrometers, the sampling interval (distance between points on the x-axis) is not perfectly constant across the entire detector. - If the sampling is non-linear, a simple summation (equivalent to
assuming \(\Delta x=1\)) will mathematically over-weight regions where data points are more densely packed.
- By providing an x_axis, the 'area' method uses the actual distances
between points (\(\Delta x\)) to calculate a physically accurate integral.
When using
aggregation='area', anx_axismust be provided. If it is omitted, the transformer raises aValueErrorrather than implicitly assuming uniform sampling density across the selected band.参考
chemotools.scale.MinMaxScalerScales features to the Min-Max range.
chemotools.scale.NormScalerScales features to unit norm.
chemotools.scale.PointScalerScales features by the intensity at a specific point.
- fit(X: ndarray, y=None) BandScaler[ソース]
Fit the transformer to the input data.
- パラメータ:
X (np.ndarray of shape (n_samples, n_features)) -- The input data to fit the transformer to.
y (None) -- Ignored to align with API.
- 戻り値:
self -- The fitted transformer.
- 戻り値の型:
BandScaler
- transform(X: ndarray, y=None) ndarray[ソース]
Transform the input data by scaling by the average intensity of the specified band.
- パラメータ:
X (np.ndarray of shape (n_samples, n_features)) -- The input data to transform.
y (None) -- Ignored to align with API.
- 戻り値:
X_transformed -- The transformed data.
- 戻り値の型:
np.ndarray of shape (n_samples, n_features)