ParetoScaler#
- class chemotools.scale.ParetoScaler(p: float = 0.5, with_mean: bool = True, copy: bool = True)[源代码]
基类:
TransformerMixin,OneToOneFeatureMixin,BaseEstimatorThis transformer scales data using a generalized power of the standard deviation, as described by [1]. It acts as a bridge between Mean Centering (P=0) and Autoscaling (P=1).
- 参数:
p (float, default=0.5) -- The exponent to use in the scaling. Must be a non-negative float between 0 and 1. - p=0.0: No scaling (Mean Centering only). - p=0.5: Standard Pareto Scaling. - p=1.0: Autoscaling (Unit Variance scaling).
with_mean (bool, default=True) -- If True, center the data before scaling. If False, no centering is performed.
copy (bool, default=True) -- If True, a copy of the input data will be made. If False, the input data will be modified in place.
- 变量:
mean (np.ndarray of shape (n_features,)) -- The mean value for each feature, calculated during fitting.
scale (np.ndarray of shape (n_features,)) -- The scale factor for each feature, calculated as the standard deviation raised to the power of p.
n_features_in (int) -- The number of features in the input data.
引用
示例
>>> from chemotools.datasets import load_fermentation_train >>> from chemotools.scale import ParetoScaler >>> # Load sample data >>> X, _ = load_fermentation_train() >>> scaler = ParetoScaler(p=0.3) ParetoScaler(p=0.3) >>> # Fit and transform the data >>> X_scaled = scaler.fit_transform(X)
备注
In spectroscopic applications, standard Pareto scaling (\(P=0.5\)) is often used to reduce the dominance of large peaks (e.g., solvent or high-abundance metabolites) without inflating baseline noise as severely as autoscaling.
According to Varmuza & Filzmoser, \(P\) should be treated as a tunable hyperparameter. For datasets where relevant information is buried in low-intensity signals but the noise floor is high, an "Adjusted" \(P\) (e.g., 0.3 or 0.7) may provide a superior balance of signal-to-noise ratio and model interpretability compared to fixed Pareto scaling.
- fit(X: ndarray, y=None) ParetoScaler[源代码]
Fit the transformer to the input data.
- 参数:
X (np.ndarray of shape (n_samples, n_features)) -- The input data to fit the transformer to.
y (None) -- Ignored to align with API.
- 返回:
self -- The fitted transformer.
- 返回类型:
ParetoScaler
- transform(X: ndarray, y=None) ndarray[源代码]
Transform the input data.
- 参数:
X (np.ndarray of shape (n_samples, n_features)) -- The input data to transform.
y (None) -- Ignored to align with API.
- 返回:
X_transformed -- The transformed data.
- 返回类型:
np.ndarray of shape (n_samples, n_features)
- inverse_transform(X: ndarray) ndarray[源代码]
Inverse transform the data back to the original space.
- 参数:
X (np.ndarray of shape (n_samples, n_features)) -- The data to inverse transform.
- 返回:
X_original -- The data transformed back to the original space.
- 返回类型:
np.ndarray of shape (n_samples, n_features)