ParetoScaler#

class chemotools.scale.ParetoScaler(p: float = 0.5, with_mean: bool = True, copy: bool = True)[源代码]

基类：DocLinkMixin, TransformerMixin, OneToOneFeatureMixin, BaseEstimator

This transformer scales data using a generalized power of the standard deviation, as described by [1]. It acts as a bridge between Mean Centering (P=0) and Autoscaling (P=1).

参数:

p (float, default=0.5) -- The exponent to use in the scaling. Must be a non-negative float between 0 and 1. - p=0.0: No scaling (Mean Centering only). - p=0.5: Standard Pareto Scaling. - p=1.0: Autoscaling (Unit Variance scaling).
with_mean (bool, default=True) -- If True, center the data before scaling. If False, no centering is performed.
copy (bool, default=True) -- If True, a copy of the input data will be made. If False, the input data will be modified in place.

变量:

mean (np.ndarray of shape (n_features,)) -- The mean value for each feature, calculated during fitting.
scale (np.ndarray of shape (n_features,)) -- The scale factor for each feature, calculated as the standard deviation raised to the power of p.
n_features_in (int) -- The number of features in the input data.

引用

示例

>>> from chemotools.datasets import load_fermentation_train
>>> from chemotools.scale import ParetoScaler
>>> # Load sample data
>>> X, _ = load_fermentation_train()
>>> scaler = ParetoScaler(p=0.3)
ParetoScaler(p=0.3)
>>> # Fit and transform the data
>>> X_scaled = scaler.fit_transform(X)

备注

In spectroscopic applications, standard Pareto scaling (\(P=0.5\)) is often used to reduce the dominance of large peaks (e.g., solvent or high-abundance metabolites) without inflating baseline noise as severely as autoscaling.

According to Varmuza & Filzmoser, \(P\) should be treated as a tunable hyperparameter. For datasets where relevant information is buried in low-intensity signals but the noise floor is high, an "Adjusted" \(P\) (e.g., 0.3 or 0.7) may provide a superior balance of signal-to-noise ratio and model interpretability compared to fixed Pareto scaling.

参见

sklearn.preprocessing.StandardScaler: Standardize features by removing the mean

and

fit(X: ndarray, y=None) → ParetoScaler[源代码]

Fit the transformer to the input data.

参数:

X (np.ndarray of shape (n_samples, n_features)) -- The input data to fit the transformer to.
y (None) -- Ignored to align with API.

返回:

self -- The fitted transformer.

返回类型:

ParetoScaler

transform(X: ndarray, y=None) → ndarray[源代码]

Transform the input data.

参数:

X (np.ndarray of shape (n_samples, n_features)) -- The input data to transform.
y (None) -- Ignored to align with API.

返回:

X_transformed -- The transformed data.

返回类型:

np.ndarray of shape (n_samples, n_features)

inverse_transform(X: ndarray) → ndarray[源代码]

Inverse transform the data back to the original space.

参数:: X (np.ndarray of shape (n_samples, n_features)) -- The data to inverse transform.
返回:: X_original -- The data transformed back to the original space.
返回类型:: np.ndarray of shape (n_samples, n_features)