TensorFlow-从对象到字符串/Int

假设使用pd.read_csv(file_path)读取特定的df时，使用对象dtype列读取文件，而不是字符串/int32 dtype列。

这表示在尝试将pandas df转换为tensorflow df时出现问题:

import pandas as pd
import tensorflow as tf
import numpy as np
# convert dummy data to object to reproduce the problem
d={'A':['a', 'b', 'c', 'd'], 'B':['e', 'f', 'g', 'h'], 'number':[1, 2, 3, 4]}
df=pd.DataFrame(d).astype(object)
# converting df to tf.dataset
ds = tf.data.Dataset.from_tensor_slices(dict(df))

下一个错误出现:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).

如何正确处理对象dtype列到字符串/数字列?

这个想法是得到下一个输出:

ds
# console output
<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types: {A: tf.string, B: tf.string, number: tf.int64}>

尝试:

df = pd.DataFrame(d).astype('category')
df['number'] = df['number'].astype(int)
ds = tf.data.Dataset.from_tensor_slices(dict(df))

<TensorSliceDataset shapes: {A: (), B: (), number: ()}, types: 
{A: tf.string, B: tf.string, number: tf.int32}>

相关内容

最新更新

热门标签：