.. _plotting_fundamentals:

Plotting Fundamentals
=====================

The ``chemotools.plotting`` module is designed to make visualizing spectroscopic data and chemometric models **fast**, **intuitive** and **publication-ready**. Instead of writing boilerplate ``matplotlib`` code, you can generate standard chemometric plots with just a few lines.

.. warning::
    The plotting module is experimental and under active development. The API may change in future versions. We welcome your feedback! Please report issues or suggestions at: https://github.com/paucablop/chemotools/issues

Why specialized plotting?
-------------------------

Visualizing high-dimensional spectral data and chemometric models often requires repetitive and verbose plotting code. ``chemotools`` simplifies this by providing:

*   **Domain-specific plots**: Spectra, scores, loadings, and outlier plots out of the box.
*   **Interactive exploration**: Quick ``show()`` method for immediate feedback.
*   **Publication quality**: Clean, standardized aesthetics that look good in papers.

Design Philosophy
-----------------

The plotting module is built around a consistent **Display Protocol** designed to balance ease of use with flexibility.

1.  **Object-Oriented**: Each plot type (e.g., ``SpectraPlot``, ``ScoresPlot``) is a class that holds your data and configuration.
2.  **Two Modes of Operation**:

    * ``show()``: Creates a new figure instantly. Perfect for quick exploration.

    * ``render(ax)``: Draws the plot onto an existing matplotlib axis. Designed for building advanced, multi-panel figures and dashboards.

3.  **Matplotlib Integration**: All plots return standard ``matplotlib.axes.Axes`` objects, allowing you to add custom annotations, lines, or styling using familiar matplotlib commands.

An overview of the plotting architecture is shown below:

.. image:: ../_static/images/explore/plotting/plotting_abstraction.png


.. note::
    Since ``chemotools`` plotting is built on top of ``matplotlib``, you can use all your favorite ``matplotlib`` commands to further customize the plots returned by ``render()`` or ``show()``.

Visualizing Spectra
-------------------

The ``SpectraPlot`` is your primary tool for exploratory data analysis. It offers flexible ways to visualize your spectral data.

For this example, we will use the fermentation dataset from ``chemotools``.

.. code-block:: python

    from chemotools.datasets import load_fermentation_train
    from chemotools.feature_selection import RangeCut

    import numpy as np

    # Load data
    X, Y = load_fermentation_train()
    wavenumbers = X.columns.values
    y = Y["glucose"]
    X = X.values

    # Measuring date
    measuring_date = np.array(["2023-01-01"] * 10 + ["2023-01-02"] * 11)

**1. Quick Visualization**

To quickly inspect your data, simply pass the wavenumbers and the spectra matrix. This plots all spectra in a single color.

.. code-block:: python

    # Create plot object
    plot = SpectraPlot(x=wavenumbers, y=X)

    # Display it
    fig = plot.show(title="All Spectra", ylabel="Absorbance")

.. image:: ../_static/images/explore/plotting/spectra_full.png


During exploration, you might want to inspect a specific region of the spectra. You can do this by specifying ``xlim`` in the ``show()`` method (see below).

.. code-block:: python

    # Display it
    fig = plot.show(title="All Spectra", ylabel="Absorbance", xlim=(900, 1500))

.. image:: ../_static/images/explore/plotting/spectra_zoomed.png

.. note::
    The ``SpectraPlot`` automatically handles y-axis scaling based on the data range. You can also manually set ``ylim`` to focus on specific features.

**2. Coloring by Continuous Variable**

You can color spectra based on a continuous target variable (like glucose concentration) to visualize correlations.

.. code-block:: python

    # Create plot object
    plot = SpectraPlot(x=wavenumbers, y=X, color_by=y)

    # Display it
    fig = plot.show(title="All Spectra", ylabel="Absorbance", xlim=(900, 1500))

.. image:: ../_static/images/explore/plotting/spectra_colored_continuous.png

**3. Coloring by Categorical Variable**

If you have categorical data (e.g., batches, experimental conditions), you can color by groups.

.. code-block:: python

    # Create plot object
    plot = SpectraPlot(x=wavenumbers, y=X, color_by=measuring_date, color_mode="categorical")

    # Display it
    fig = plot.show(title="All Spectra", ylabel="Absorbance", xlim=(900, 1500))

.. image:: ../_static/images/explore/plotting/spectra_colored_categorical.png


Analyzing Models
----------------

After fitting a chemometric model (like PCA or PLS), visualizing the results is crucial for interpretation. For this section, we will use a toy PCA model fitted with the fermentation data from ``chemotools``.

.. code-block:: python

    from sklearn.decomposition import PCA
    import matplotlib.pyplot as plt

    # Fit a PCA model
    pca = PCA(n_components=3)
    scores = pca.fit_transform(X)

**Explained Variance: Choosing Components**

Before analyzing scores and loadings, it is often useful to check how much variance each component explains. The ``ExplainedVariancePlot`` helps you decide the optimal number of components.

.. code-block:: python

    from chemotools.plotting import ExplainedVariancePlot

    # Plot explained variance ratio
    plot = ExplainedVariancePlot(pca.explained_variance_ratio_)
    fig = plot.show(title="Explained Variance")

.. image:: ../_static/images/explore/plotting/explained_variance.png

**Scores: The Sample Space**

Use ``ScoresPlot`` to visualize how samples relate to each other. This is essential for identifying clusters, trends, or outliers. The ``ScoresPlot`` is highly flexible and can be used to create composite figures to show different aspects of your model, such as confidence ellipses and sample annotations.

.. code-block:: python

    from chemotools.plotting import ScoresPlot


    fig, ax = plt.subplots(1, 2, figsize=(12, 5))

    # 1. Simple2D scores plot colored by glucose concentration
    plot = ScoresPlot(scores, components=(0, 1), color_by=y)
    plot.render(ax=ax[0])
    ax[0].set_title("Scores Plot")

    # 2. Advanced scores plot with confidence ellipse and annotations
    sample_names = [f"{i}" for i in range(len(scores))]
    plot = ScoresPlot(
        scores,
        confidence_ellipse=0.9,
        annotations=sample_names,
        components=(0, 1),
        color_by=y,
    )
    plot.render(ax=ax[1])
    ax[1].set_title("Scores Plot with Annotations")

.. image:: ../_static/images/explore/plotting/scores_advanced.png

.. note::
    Since we are composing two plots into a single figure, we used the ``render(ax=...)`` method to draw each plot onto specific axes. This allows for precise control over layout and styling.

**Loadings: The Feature Space**

Use ``LoadingsPlot`` to understand which spectral features contribute most to the model.

.. code-block:: python

    from chemotools.plotting import LoadingsPlot

    loadings = pca.components_.T

    # Plot loadings for the first component
    plot = LoadingsPlot(loadings, feature_names=wavenumbers, components=0)
    fig = plot.show(title="PC1 Loadings", ylabel="Loading Coefficient")

.. image:: ../_static/images/explore/plotting/loadings_example.png

**Outlier Detection**

Use ``DistancesPlot`` to identify samples that don't fit the model well, using metrics like Hotelling's T² and Q-residuals. See :doc:`/methods/outliers` for more details on calculating these statistics.

.. code-block:: python

    from chemotools.outliers import HotellingT2, QResiduals

    # Calculate outlier statistics
    hotelling = HotellingT2(pca).fit(X)
    q_residuals = QResiduals(pca).fit(X)

Now, we are ready to visualize the results.

.. code-block:: python

    from chemotools.plotting import DistancesPlot

    # Plot T² vs Q-residuals
    plot = DistancesPlot(
        x=hotelling.predict_residuals(X_cut),
        y=q_residuals.predict_residuals(X_cut),
        confidence_lines=(hotelling.critical_value_, q_residuals.critical_value_),
        color_by=y,
    ).render(ax=ax[0],xlabel="Hotelling's T²", ylabel="Q Residuals")

    plot = DistancesPlot(
        x=hotelling.predict_residuals(X_cut),
        y=q_residuals.predict_residuals(X_cut),
        confidence_lines=(hotelling.critical_value_, q_residuals.critical_value_),
        color_by=measuring_date,
        annotations=temperatures,
    ).render(ax=ax[1],xlabel="Hotelling's T²", ylabel="Q Residuals")

.. image:: ../_static/images/explore/plotting/outliers_example.png

Evaluating Predictions
----------------------

For regression models, the ``PredictedVsActualPlot`` provides a standard way to assess model performance.

.. code-block:: python

    from chemotools.plotting import PredictedVsActualPlot, YResidualsPlot
    y_residuals = y_test - y_pred

    # Assume y_pred comes from a PLS model
    fig, ax = plt.subplots(1, 2, figsize=(12, 5))
    PredictedVsActualPlot(y_true=y_test, y_pred=y_pred).render(
        ax=ax[0], xlabel="Actual (g/L)", ylabel="Predicted (g/L)"
    )
    YResidualsPlot(residuals=y_residuals, add_confidence_band=True).render(
        ax=ax[1], xlabel="Sample Index", ylabel="Residuals (g/L)"
    )


.. image:: ../_static/images/explore/plotting/predictions_example.png

Creating Composite Figures
--------------------------

All plotting classes support a ``render(ax=...)`` method, allowing you to place plots onto existing matplotlib axes. This is powerful for creating dashboards or comparison figures.

.. code-block:: python

    import matplotlib.pyplot as plt

    # Create a figure with 2 subplots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
    
    # Plot 1: All spectra
    SpectraPlot(x=wavenumbers, y=X, color='lightgray').render(ax1)
    ax1.set_title("Raw Spectra")
    
    # Plot 2: Mean spectrum
    SpectraPlot(x=wavenumbers, y=X.mean(axis=0), color='black').render(ax2)
    ax2.set_title("Mean Spectrum")
    
    plt.tight_layout()
    plt.show()

Other Available Plots
---------------------

The ``chemotools.plotting`` module includes other specialized plots not covered in this guide:

*   ``FeatureSelectionPlot``: Visualize feature importance and selection results.
*   ``QQPlot``: Check for normality of residuals.
*   ``ResidualDistributionPlot``: Analyze the distribution of model residuals.
*   ``YResidualsPlot``: Plot residuals against predicted values.

Check the API reference for more details on these classes.