处理 DataFrame#

献给 pandas.DataFrame 和 polars.DataFrame 爱好者。默认情况下，所有 scikit-learn 和 chemotools 转换器都输出 numpy.ndarray。然而，现在可以配置您的 chemotools 预处理方法以生成 pandas.DataFrame 或 polars.DataFrame 对象作为输出。这是在实现了来自 scikit-learn 的新 set_output() API（对于 pandas 需要 >= 1.2.2，对于 polars 需要 >= 1.4.0）（文档）之后实现的。现在，在其他 scikit-learn 预处理方法（如 StandardScaler()）中实现的相同 API 也可用于 chemotools 转换器。

备注

从版本 0.1.3 开始，set_output() 可用于所有 chemotools 函数！

以下是两个如何使用此新 API 的示例：

示例 1：在单个预处理方法中使用 set_output() API#

1. 将光谱数据加载为 `pandas.DataFrame`#

首先加载您的光谱数据。在这种情况下，我们假设有一个名为 spectra.csv 的文件，其中每一行代表一个光谱，每一列代表波数。

import pandas as pd
from chemotools.baseline import AirPls

# Load your data as a pandas DataFrame
spectra = pd.read_csv('data/spectra.csv', index_col=0)

spectra 变量是一个 pandas.DataFrame 对象，其索引代表样本名称，列代表波数。

2. 创建一个 `chemotools` 预处理对象并将输出设置为 `pandas`#

接下来，我们创建 AirPls 对象并将输出设置为 pandas。

# Create an AirPLS object and set the output to pandas
airpls = AirPls().set_output(transform='pandas')

set_output() 方法接受以下参数：

transform：输出格式。可以是 'pandas' 或 'default'``（默认格式将输出 ``numpy.ndarray）。

提示

如果您想将输出设置为 polars，您可以在 set_output() 方法中使用 transform='polars'``（``AirPLS().set_output(transform='polars')）。

3. 拟合并转换光谱#

# Fit and transform the spectra
spectra_airpls = airpls.fit_transform(spectra)

fit_transform() 方法的输出现在是一个 pandas.DataFrame 对象。

提示

请注意，默认情况下输入数据的索引和列不会保留到输出中，spectra_airpls DataFrame 具有默认的索引和列。

示例 2：在管道中使用 set_output() API#

类似地，set_output() API 也可用于管道。以下代码显示了如何创建一个执行以下操作的管道：

乘性散射校正
标准缩放

import pandas as pd
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from chemotools.scatter import MultiplicativeScatterCorrection

# Make the pipeline
pipeline = make_pipeline(MultiplicativeScatterCorrection(), StandardScaler())

# Set the output to pandas
pipeline.set_output(transform="pandas")

# Fit the pipeline and transform the spectra
output = pipeline.fit_transform(spectra)

提示

如果您想将输出设置为 polars，您可以在 set_output() 方法中使用 transform='polars'``（``pipeline.set_output(transform='polars')）。