ParetoScaler#

class chemotools.scale.ParetoScaler(p: float = 0.5, with_mean: bool = True, copy: bool = True)[source]

Bases: TransformerMixin, OneToOneFeatureMixin, BaseEstimator

This transformer scales data using a generalized power of the standard deviation, as described by [1]. It acts as a bridge between Mean Centering (P=0) and Autoscaling (P=1).

Parameters:

p (float, default=0.5) – The exponent to use in the scaling. Must be a non-negative float between 0 and 1. - p=0.0: No scaling (Mean Centering only). - p=0.5: Standard Pareto Scaling. - p=1.0: Autoscaling (Unit Variance scaling).
with_mean (bool, default=True) – If True, center the data before scaling. If False, no centering is performed.
copy (bool, default=True) – If True, a copy of the input data will be made. If False, the input data will be modified in place.

Variables:

mean (np.ndarray of shape (n_features,)) – The mean value for each feature, calculated during fitting.
scale (np.ndarray of shape (n_features,)) – The scale factor for each feature, calculated as the standard deviation raised to the power of p.
n_features_in (int) – The number of features in the input data.

References

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import ParetoScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> scaler = ParetoScaler(p=0.3)
ParetoScaler(p=0.3)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

Notes

In spectroscopic applications, standard Pareto scaling (\(P=0.5\)) is often used to reduce the dominance of large peaks (e.g., solvent or high-abundance metabolites) without inflating baseline noise as severely as autoscaling.

According to Varmuza & Filzmoser, \(P\) should be treated as a tunable hyperparameter. For datasets where relevant information is buried in low-intensity signals but the noise floor is high, an “Adjusted” \(P\) (e.g., 0.3 or 0.7) may provide a superior balance of signal-to-noise ratio and model interpretability compared to fixed Pareto scaling.