ParetoScaler#

class chemotools.scale.ParetoScaler(p: float = 0.5, with_mean: bool = True, copy: bool = True)[source]

Bases: TransformerMixin, OneToOneFeatureMixin, BaseEstimator

This transformer scales data using a generalized power of the standard deviation, as described by [1]. It acts as a bridge between Mean Centering (P=0) and Autoscaling (P=1).

Parameters:
  • p (float, default=0.5) – The exponent to use in the scaling. Must be a non-negative float between 0 and 1. - p=0.0: No scaling (Mean Centering only). - p=0.5: Standard Pareto Scaling. - p=1.0: Autoscaling (Unit Variance scaling).

  • with_mean (bool, default=True) – If True, center the data before scaling. If False, no centering is performed.

  • copy (bool, default=True) – If True, a copy of the input data will be made. If False, the input data will be modified in place.

Variables:
  • mean (np.ndarray of shape (n_features,)) – The mean value for each feature, calculated during fitting.

  • scale (np.ndarray of shape (n_features,)) – The scale factor for each feature, calculated as the standard deviation raised to the power of p.

  • n_features_in (int) – The number of features in the input data.

References

Examples

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import ParetoScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> scaler = ParetoScaler(p=0.3)
ParetoScaler(p=0.3)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

Notes

In spectroscopic applications, standard Pareto scaling (\(P=0.5\)) is often used to reduce the dominance of large peaks (e.g., solvent or high-abundance metabolites) without inflating baseline noise as severely as autoscaling.

According to Varmuza & Filzmoser, \(P\) should be treated as a tunable hyperparameter. For datasets where relevant information is buried in low-intensity signals but the noise floor is high, an “Adjusted” \(P\) (e.g., 0.3 or 0.7) may provide a superior balance of signal-to-noise ratio and model interpretability compared to fixed Pareto scaling.

See also

sklearn.preprocessing.StandardScaler

Standardize features by removing the mean

and

fit(X: ndarray, y=None) ParetoScaler[source]

Fit the transformer to the input data.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to fit the transformer to.

  • y (None) – Ignored to align with API.

Returns:

self – The fitted transformer.

Return type:

ParetoScaler

transform(X: ndarray, y=None) ndarray[source]

Transform the input data.

Parameters:
  • X (np.ndarray of shape (n_samples, n_features)) – The input data to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – The transformed data.

Return type:

np.ndarray of shape (n_samples, n_features)

inverse_transform(X: ndarray) ndarray[source]

Inverse transform the data back to the original space.

Parameters:

X (np.ndarray of shape (n_samples, n_features)) – The data to inverse transform.

Returns:

X_original – The data transformed back to the original space.

Return type:

np.ndarray of shape (n_samples, n_features)