**Persisting your models**
==============================

Previously, we saw how to use ``chemotools`` in combination with ``scikit-learn`` to preprocess your data and make predictions. However, in a real-world scenario, we would like to persist our trained (fitted) pipelines to deploy it to a production environment. In this section, we will show two ways to persist our models:

* Using ``pickle``
* Using ``joblib``

An overview of the workflow is shown in the image below:

.. image:: ./_figures/persist_scheme.png
    :alt: Persist your models
    :align: center
    :width: 600

For this section, we will use the following fit pipeline as an example:

.. code-block:: python

    from chemotools.feature_selection import RangeCut
    from chemotools.baseline import LinearCorrection
    from chemotools.derivative import SavitzkyGolay
    from sklearn.cross_decomposition import PLSRegression
    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import StandardScaler

    # Define the pipeline
    pipeline = make_pipeline(
        RangeCut(start=950, end=1550, wavenumbers=wavenumbers),
        LinearCorrection(),
        SavitzkyGolay(window_size=21, polynomial_order=2, derivate_order=1),
        StandardScaler(with_mean=True, with_std=False),
        PLSRegression(n_components=2, scale=False)
    )

    # Fit the model
    pipeline.fit(spectra, reference)

Using ``pickle``
---------------

``pickle`` is a Python module that implements a binary protocol for serializing and de-serializing a Python object structure. It is a standard module that comes with the Python installation. The following code shows how to persist a ``scikit-learn`` model using ``pickle``:

.. note::
    Notice that the ``pickle`` module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

.. code-block:: python

    import pickle

    # persist model
    filename = 'model.pkl'

    with open(filename, 'wb') as file:
        pickle.dump(pipeline, file)

    # load model
    with open(filename, 'rb') as file:
        pipeline = pickle.load(file)

Using ``joblib``
---------------

``joblib`` is a Python module that provides utilities for saving and loading Python objects that make use of NumPy data structures, efficiently. It is not part of the standard Python installation, but it can be installed using ``pip``. The following code shows how to persist a ``scikit-learn`` model using ``joblib``:

.. code-block:: python

    from joblib import dump, load

    # persist model
    filename = 'model.joblib'

    with open(filename, 'wb') as file:
        dump(pipeline, file)

    # load model
    with open(filename, 'rb') as file:
        pipeline = load(file)