DirectOrthogonalization#

class chemotools.projection.DirectOrthogonalization(n_components: int = 1, copy=False)[source]

Bases: TransformerMixin, BaseEstimator

Remove variation in X that is uncorrelated with the target y using Direct Orthogonalization (DO) [1] [2].

DO removes from X systematic variation that is independent of y. X is orthogonalized with respect to y, PCA is performed on the orthogonalized matrix to estimate orthogonal components, and those components are subtracted from X to obtain the corrected data.

The transformer returns the corrected matrix with the same number of features, retaining variation relevant for predicting y. Inputs are typically assumed to be mean-centered.

Parameters:
  • n_components (int, default=1) – The number of orthogonal components to compute. This determines how many orthogonal variations will be removed from the data.

  • copy (bool, default=False) – If True, a copy of the input data is created and used for computations. If False, the input data is modified in place.

Variables:
  • x_weights_orth (ndarray of shape (n_features, n_components)) – The weights of the orthogonal components.

  • x_loadings_orth (ndarray of shape (n_features, n_components)) – The loadings of the orthogonal components.

  • x_scores_orth (ndarray of shape (n_samples, n_components)) – The scores of the orthogonal components.

  • mean_X (ndarray of shape (n_features,)) – The mean of the original data X used for centering.

  • mean_y (float or ndarray of shape (n_targets,)) – The mean of the target variable y used for centering.

  • retained_variance_ratio (float) – The proportion of variance in X retained explained by the predictive components.

  • removed_variance_ratio (float) – The proportion of variance in X removed explained by the orthogonal components.

References

Examples

Fit and apply DirectOrthogonalization to remove variation in X that is orthogonal to y.

>>> import numpy as np
>>> from chemotools.projection import DirectOrthogonalization
>>> X = np.array([[1, 2], [3, 4], [5, 6]])
>>> y = np.array([1, 2, 3])
>>> do = DirectOrthogonalization(n_components=1)
>>> do.fit(X, y)
DirectOrthogonalization(n_components=1, copy=False)
>>> X_transformed = do.transform(X, y)

Initialize the DirectOrthogonalization transformer.

Parameters:
  • n_components (int, default=1) – The number of orthogonal components to compute. This determines how many orthogonal variations will be removed from the data.

  • copy (bool, default=False) – If True, a copy of the input data is created and used for computations. If False, the input data is modified in place.

fit(X: ndarray, y: ndarray) DirectOrthogonalization[source]

Fit the DirectOrthogonalization model to the training data. :param X: The input data to fit the model to. :type X: array-like of shape (n_samples, n_features) :param y: The target values. :type y: array-like of shape (n_samples,)

Returns:

self – Fitted estimator.

Return type:

DirectOrthogonalization

transform(X: ndarray, y=None) ndarray[source]

Apply the Direct Orthogonalization (D) correction to X

This returns the predictive part of the data, i.e. the variation in X that is related to y, after removing the orthogonal part (variation in X that is not related to y).

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input data to transform.

  • y (None) – Ignored to align with API.

Returns:

X_transformed – The transformed data.

Return type:

array-like of shape (n_samples, n_features)