我正试图从这个字典中创建一个tf数据集,其中数据集将有四个元素,最后一个元素具有与其他列表不同的列表。
当这样做时,我得到一个错误ValueError: Can't convert non-rectangular Python sequence to Tensor.
.
这里解释的解决方案-使用tf.ragged.constant(data)
不工作,因为我使用字典。有办法制作这样的数据集吗?
t_dic = {"uuid": np.array(["abc", "def", "ghi", "pqr"]),
"a": [np.array([1, 2, 3]),
np.array([6, 2, 3]),
np.array([6, 8, 1]),
np.array([6, 2, 3, 10])],
"b": [np.array(["a", "f", "f"]),
np.array(["aa", "ff", "fs"]),
np.array(["aa", "ff", "fs"]),
np.array(["aa", "ff", "fs", "ss"])]}
x = tf.data.Dataset.from_tensor_slices(t_dic)
如果你想保持未来张量的形状而不添加填充,我建议弹出各种列表长度的键,然后在新字典中tf.ragged.constant()
它们。
在你的例子中:
t_dic = {"uuid": np.array(["abc", "def", "ghi", "pqr"]),
"a": [np.array([1, 2, 3]),
np.array([6, 2, 3]),
np.array([6, 8, 1]),
np.array([6, 2, 3, 10])],
"b": [np.array(["a", "f", "f"]),
np.array(["aa", "ff", "fs"]),
np.array(["aa", "ff", "fs"]),
np.array(["aa", "ff", "fs", "ss"])]}
key_a = t_dic.pop("a") # popping "a" from t_dic
key_b = t_dic.pop("b") # popping "b" from t_dic
ragged_features = {"a": tf.ragged.constant(key_a), "b": tf.ragged.constant(key_b)} # creating a new dictionary with tf.ragged values of "a" and "b"
preprocessed_data = t_dic | ragged_features # joining the former and later dictonary
x = tf.data.Dataset.from_tensor_slices(preprocessed_data) # transforming in the desired output
我发现有用的,以及,是MapDataset从您的x
:
x2 = x.map(lambda x: {
"uuid": x["uuid"],
"a": x["a"],
"b": x["b"]
})
输出x2
可以迭代、批处理和映射,例如:
for key in x2.take(3).as_numpy_iterator():
pprint.pprint(key)
x2.element_spec # useful to check if the shape is what you want, in this case 'None' means various shapes
,输出为:
{'a': array([1, 2, 3]),
'b': array([b'a', b'f', b'f'], dtype=object),
'uuid': b'abc'}
{'a': array([6, 2, 3]),
'b': array([b'aa', b'ff', b'fs'], dtype=object),
'uuid': b'def'}
{'a': array([6, 8, 1]),
'b': array([b'aa', b'ff', b'fs'], dtype=object),
'uuid': b'ghi'}
{'uuid': TensorSpec(shape=(), dtype=tf.string, name=None),
'a': TensorSpec(shape=(None,), dtype=tf.int32, name=None),
'b': TensorSpec(shape=(None,), dtype=tf.string, name=None)}
最后,如果您要批处理tf.data。数据集,请记住,可能需要使用dense_to_ragged_batch()
,像这样:
x2_batched = x2.apply(tf.data.experimental.dense_to_ragged_batch(batch_size=2))
链接:批处理粗糙张量