从numpy和scipy.sparse为tensorflow准备数据输入

如何准备数据输入到一个tensorflow模型(比如一个keras Sequential模型)?

我知道如何使用numpy和scipy(最终是pandas,sklearn风格)准备x_train,y_train,x_test和y_test，其中train/test数据是用于训练神经模型的训练和测试数据，x/y代表2D稀疏矩阵和1D numpy数组，表示与x数据中raw数量相同大小的整数标签。

我正在努力与数据集文档没有很多的见解到目前为止…

到目前为止，我只能转换scipy。稀疏矩阵转化为张量流。SparseTensor使用类似的东西

import numpy as np
import tensorflow as tf
from scipy import sparse as sp
x = sp.csr_matrix( ... )
x = tf.SparseTensor(indices=np.vstack([*x.nonzero()]).T, 
values=x.data, 
dense_shape=x.shape)

，我可以将numpy数组转换为张量流。使用类似

的张量

import numpy as np
import tensorflow as tf
y = np.array( ... ) # 1D array of len == x.shape[0]
y = tf.constant(y)

如何将x和y对齐到一个单一的数据集，以构建批处理，缓冲区，…并从数据集实用程序中获益?
我应该使用zip,from_tensor_slices，或任何其他方法的tensorflow.data.Dataset模块?

x和y的示例如下:

x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))

您应该能够使用tf.data.Data.from_tensor_slices，因为您提到"y是一个1D numpy数组，表示与x数据中的行数相同大小的整数标签";

import tensorflow as tf
x = tf.SparseTensor(indices=[[0, 0], [1, 2]], values=[1, 2], dense_shape=[3, 4])
y = tf.constant(np.array(range(3)))
dataset = tf.data.Dataset.from_tensor_slices((x, y))
for x, y in dataset:
print(x, y)

SparseTensor(indices=tf.Tensor([[0]], shape=(1, 1), dtype=int64), values=tf.Tensor([1], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(0, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([[2]], shape=(1, 1), dtype=int64), values=tf.Tensor([2], shape=(1,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(1, shape=(), dtype=int64)
SparseTensor(indices=tf.Tensor([], shape=(0, 1), dtype=int64), values=tf.Tensor([], shape=(0,), dtype=int32), dense_shape=tf.Tensor([4], shape=(1,), dtype=int64)) tf.Tensor(2, shape=(), dtype=int64)

相关内容

最新更新

热门标签：