Tensorflow映射函数拆分数据集结构



tensorflow map函数中的数据集结构有问题。我的数据是这样的:

简单

`train_examples = tf.data.Dataset.from_tensor_slices(train_data)
[[0,1,2,3,4,5,...],
[32,33,34,35,36,...]],

真实

print(train_data[0])
[[array([2,539, 400, 513, 398, 523, 485, 533, 568, 566, 402, 565, 491,
570, 576, 539, 351, 538, 297, 539, 262, 564, 313, 581, 370, 589,
421, 514, 314, 501, 370, 489, 420,3]), array([2, 534, 403, 507, 401, 519, 487, 531, 567, 562, 405, 544, 495,
537, 588, 528, 354, 526, 300, 534, 259, 555, 315, 575, 370, 589,
421, 499, 315, 489, 372, 483, 423,3])]]

我转换为管道<TensorSliceDataset shapes: (2, 34), types: tf.int64>的张量

train_examples包含具有17k行的2D张量[[source],[target]]。

def make_batches(ds):
return (
ds
.cache()
.shuffle(BUFFER_SIZE)
.batch(BATCH_SIZE)
.map(lambda x_int,y_int: x_int,y_int, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.prefetch(tf.data.experimental.AUTOTUNE))
train_batches = make_batches(train_examples)

对于映射,我希望数据结构分别输出源和目标。我尝试了map(prepare, num_parallel_calls=tf.data.experimental.AUTOTUNE)功能

def prepare(ds):
srcs = tf.ragged.constant(ds.numpy().[0],tf.int64)
trgs = tf.ragged.constant(ds.numpy().[1],tf.int64)
srcs = srcs.to_tensor()
trgs = trgs.to_tensor()
return srcs,trgs

但是tensorflow不允许在map函数中急切地执行。如果我还错过了在Tensorflow中使用map函数的其他内容,请告诉我。非常感谢。

Tensorflow版本=2.7

您可以尝试这样分割样本:

import tensorflow as tf
import numpy as np

data = [[np.array([2,539, 400, 513, 398, 523, 485, 533, 568, 566, 402, 565, 491,
570, 576, 539, 351, 538, 297, 539, 262, 564, 313, 581, 370, 589,
421, 514, 314, 501, 370, 489, 420,3]), np.array([2, 534, 403, 507, 401, 519, 487, 531, 567, 562, 405, 544, 495,
537, 588, 528, 354, 526, 300, 534, 259, 555, 315, 575, 370, 589,
421, 499, 315, 489, 372, 483, 423,3])]]
samples = 50
data = data * samples
ds = tf.data.Dataset.from_tensor_slices(data)
def prepare(x):
srcs, trgs = tf.split(x, num_or_size_splits = 2, axis=1)
return srcs,trgs
def make_batches(ds):
return (
ds
.cache()
.shuffle(50)
.batch(10)
.map(prepare, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.prefetch(tf.data.experimental.AUTOTUNE))
train_batches = make_batches(ds)
for x, y in train_batches.take(1):
print(x.shape, y.shape)
(10, 1, 34) (10, 1, 34)

最新更新