Explore Our Datasets πŸ”#

Welcome to the world of data exploration! Our chemotools package provides useful datasets that help you test the package and learn. You can find these datasets in the chemotools.datasets module and access them using simple loading functions. Here’s what we offer:

The Fermentation Dataset πŸ§ͺ#

This dataset contains spectra collected during a yeast fermentation process using attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR). The dataset includes both a training set and a test set.

For more information about the Fermentation Dataset, see these publications:

The Train Set#

The train set contains 21 synthetic spectra with reference glucose concentrations, measured by high-performance liquid chromatography (HPLC). You can load the train set as a pandas.DataFrame or as a polars.DataFrame:

Load as pandas.DataFrame:

from chemotools.datasets import load_fermentation_train

X_train, y_train = load_fermentation_train()

Load as polars.DataFrame:

from chemotools.datasets import load_fermentation_train

X_train, y_train = load_fermentation_train(set_output="polars")

Note

Polars is supported in chemotools>=0.1.5

Note

To learn how to build a PLS model using the Fermentation Dataset, see our Training Guide.

The Test Set#

The test set contains over 1000 spectra collected during a fermentation process. These spectra were captured every 1.25 minutes over several hours. It also includes 35 reference glucose concentrations measured hourly during the fermentation.

Load the test set using:

Load as pandas.DataFrame:

from chemotools.datasets import load_fermentation_test

X_test, y_test = load_fermentation_test()

Load as polars.DataFrame:

from chemotools.datasets import load_fermentation_test

X_test, y_test = load_fermentation_test(set_output="polars")

Note

The wavenumbers are stored as column names in both the pandas.DataFrame and the polars.DataFrame. In a pandas.DataFrame the column names can be of type float, but in a polars.DataFrame the column names must be of type str.

The Coffee Dataset β˜•#

The Coffee Dataset contains spectra collected from various coffee samples from different countries. These spectra were collected using attenuated total reflectance Fourier transform infrared spectroscopy (ATR-FTIR).

Load as pandas.DataFrame:

from chemotools.datasets import load_coffee

spectra, labels = load_coffee()

Load as polars.DataFrame:

from chemotools.datasets import load_coffee

spectra, labels = load_coffee(set_output="polars")

Note

To learn how to build a PLS-DA classification model using the Coffee Dataset, see our Training Guide.

We hope you enjoy exploring these datasets! πŸš€