如何将tf.data转换应用于DataFrame

我想将tf.data转换应用于panda数据帧。根据tensorflow文档HERE，我可以将tf.data直接应用于数据帧，但数据帧的数据类型应该是统一的。

当我像下面的一样将tf.data应用于我的数据帧时

tf.data.Dataset.from_tensor_slices(df['reports'])

它生成这个错误

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

当我打印df['reports'].dtype时，似乎是dtype('O')不统一，如果是这种情况，那么我如何将该数据帧转换为统一的dtype。

您可以尝试将df["reports"]强制为特定类型。假设你想把这一列转换成数字，你可以很容易地这样做：

df['reports'] = pd.to_numeric(df['reports'])

不管怎样，我建议你调查一下dtype('O')不均匀的原因。你的数据可能有错误。

尝试使用不规则的结构：

import tensorflow as tf
import pandas as pd
df = pd.DataFrame(data={'reports': [[2.0, 3.0, 4.0], [2.0, 3.0], [2.0]]})
dataset = tf.data.Dataset.from_tensor_slices(tf.ragged.constant(df['reports']))
for x in dataset:
print(x)

tf.Tensor([2. 3. 4.], shape=(3,), dtype=float32)
tf.Tensor([2. 3.], shape=(2,), dtype=float32)
tf.Tensor([2.], shape=(1,), dtype=float32)

相关内容

最新更新

热门标签：