ParetoScaler#
- class chemotools.scale.ParetoScaler(p: float = 0.5, with_mean: bool = True, copy: bool = True)[source]
Bases:
TransformerMixin,OneToOneFeatureMixin,BaseEstimatorThis transformer scales data using a generalized power of the standard deviation, as described by [1]. It acts as a bridge between Mean Centering (P=0) and Autoscaling (P=1).
- Parameters:
p (float, default=0.5) – The exponent to use in the scaling. Must be a non-negative float between 0 and 1. - p=0.0: No scaling (Mean Centering only). - p=0.5: Standard Pareto Scaling. - p=1.0: Autoscaling (Unit Variance scaling).
with_mean (bool, default=True) – If True, center the data before scaling. If False, no centering is performed.
copy (bool, default=True) – If True, a copy of the input data will be made. If False, the input data will be modified in place.
- Variables:
mean (np.ndarray of shape (n_features,)) – The mean value for each feature, calculated during fitting.
scale (np.ndarray of shape (n_features,)) – The scale factor for each feature, calculated as the standard deviation raised to the power of p.
n_features_in (int) – The number of features in the input data.
References
Examples
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.scale import ParetoScaler >>> # Load sample data >>> X, _ = load_fermentation_train() >>> scaler = ParetoScaler(p=0.3) ParetoScaler(p=0.3) >>> # Fit and transform the data >>> X_scaled = scaler.fit_transform(X)
Notes
In spectroscopic applications, standard Pareto scaling (\(P=0.5\)) is often used to reduce the dominance of large peaks (e.g., solvent or high-abundance metabolites) without inflating baseline noise as severely as autoscaling.
According to Varmuza & Filzmoser, \(P\) should be treated as a tunable hyperparameter. For datasets where relevant information is buried in low-intensity signals but the noise floor is high, an “Adjusted” \(P\) (e.g., 0.3 or 0.7) may provide a superior balance of signal-to-noise ratio and model interpretability compared to fixed Pareto scaling.
- fit(X: ndarray, y=None) ParetoScaler[source]
Fit the transformer to the input data.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to fit the transformer to.
y (None) – Ignored to align with API.
- Returns:
self – The fitted transformer.
- Return type:
ParetoScaler
- transform(X: ndarray, y=None) ndarray[source]
Transform the input data.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The input data to transform.
y (None) – Ignored to align with API.
- Returns:
X_transformed – The transformed data.
- Return type:
np.ndarray of shape (n_samples, n_features)
- inverse_transform(X: ndarray) ndarray[source]
Inverse transform the data back to the original space.
- Parameters:
X (np.ndarray of shape (n_samples, n_features)) – The data to inverse transform.
- Returns:
X_original – The data transformed back to the original space.
- Return type:
np.ndarray of shape (n_samples, n_features)