DirectOrthogonalization#
- class chemotools.projection.DirectOrthogonalization(n_components: int = 1, copy=False)[source]
Bases:
TransformerMixin,BaseEstimatorRemove variation in X that is uncorrelated with the target y using Direct Orthogonalization (DO) [1] [2].
DO removes from X systematic variation that is independent of y. X is orthogonalized with respect to y, PCA is performed on the orthogonalized matrix to estimate orthogonal components, and those components are subtracted from X to obtain the corrected data.
The transformer returns the corrected matrix with the same number of features, retaining variation relevant for predicting y. Inputs are typically assumed to be mean-centered.
- Parameters:
n_components (int, default=1) – The number of orthogonal components to compute. This determines how many orthogonal variations will be removed from the data.
copy (bool, default=False) – If True, a copy of the input data is created and used for computations. If False, the input data is modified in place.
- Variables:
x_weights_orth (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.
x_loadings_orth (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.
x_scores_orth (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.
mean_X (ndarray of shape (n_features,)) – The mean of the original data X used for centering.
mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable y used for centering.
retained_variance_ratio (float) – The proportion of variance in X retained explained by the predictive components.
removed_variance_ratio (float) – The proportion of variance in X removed explained by the orthogonal components.
References
Examples
Fit and apply DirectOrthogonalization to remove variation in X that is orthogonal to y.
>>> import numpy as np >>> from chemotools.projection import DirectOrthogonalization >>> X = np.array([[1, 2], [3, 4], [5, 6]]) >>> y = np.array([1, 2, 3]) >>> do = DirectOrthogonalization(n_components=1) >>> do.fit(X, y) DirectOrthogonalization(n_components=1, copy=False) >>> X_transformed = do.transform(X, y)
Initialize the DirectOrthogonalization transformer.
- Parameters:
n_components (int, default=1) – The number of orthogonal components to compute. This determines how many orthogonal variations will be removed from the data.
copy (bool, default=False) – If True, a copy of the input data is created and used for computations. If False, the input data is modified in place.
- fit(X: ndarray, y: ndarray) DirectOrthogonalization[source]
Fit the DirectOrthogonalization model to the training data. :param X: The input data to fit the model to. :type X: array-like of shape (n_samples, n_features) :param y: The target values. :type y: array-like of shape (n_samples,)
- Returns:
self – Fitted estimator.
- Return type:
DirectOrthogonalization
- transform(X: ndarray, y=None) ndarray[source]
Apply the Direct Orthogonalization (D) correction to X
This returns the predictive part of the data, i.e. the variation in X that is related to y, after removing the orthogonal part (variation in X that is not related to y).
- Parameters:
X (array-like of shape (n_samples, n_features)) – The input data to transform.
y (None) – Ignored to align with API.
- Returns:
X_transformed – The transformed data.
- Return type:
array-like of shape (n_samples, n_features)