将张量数据剪裁到边界体积

我有两个关于tensorflow 2.0的问题，重点是tensorflow如何在其运算图中处理组合条件测试。

任务：将大量数据点分割成块，并将索引存储到属于该卷的样本(而不是样本本身(。

我最初的方法是：循环所有元素并收集"边界体积"内的数据点的索引。无论我如何重新排序坐标上的比较，这都非常缓慢。

# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
i = 0
indices = tf.zero(shape[0], dtype = tf.int32)
for x in X:
if (x[0] > xmin[0]):
if (x[1] > xmin[1]):
if (x[2] <= xmax[2]):
# ...and so on...
indices = tf.concat([indices, i], axis = 0)
i = i + 1
return indices

然后，我想出了产生布尔张量的想法，并在逻辑上"one_answers"它们，以获得我需要的元素的indices。速度更快，如下一个示例所示：

# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
# example of 3 different conditions to clip to (a part of) the bounding volume 
# X is the data and xmin and xmax are tensors containing the bounding volume
c0 = (X[:,0] >   xmin[0])
c1 = (X[:,1] >   xmin[1]) # processing all elements
c2 = (X[:,2] <=  xmax[2]) # idem
# ... there could be many more conditions, you get the idea..
indices = tf.where(tf.math.logical_and(c1, tf.math.logical_and(c2, c3) )
return indices
#    ...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices)

这段代码产生了正确的结果，但我想知道它是否是最佳。

第一个问题是关于日程安排：

保存操作的tensorflow图是否会剔除(的块(条件测试，如果它知道已经测试的一些(块(元素CCD_ 2。由于logical_and结合了逻辑条件，则不会对这些元素进行后续条件测试产生CCD_ 4。

事实上，在上述示例中，c1和c2正在对可能已经从集合中排除的c0的元素提出问题。特别是当你有大量的元素要测试时，这可能是浪费时间，即使在并行硬件平台上也是如此

那么，如果我们根据之前的测试结果级联测试呢？尽管这似乎是一个已解决的问题，但这个解决方案是不正确的，因为最终的indices张量将指子集_X，而不是指集合X:

# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
c0 = (X[:,0] >   xmin[0])
indices = tf.where(c0)
_X = tf.gather(X, indices)
c1 = (_X[:,1] >   xmin[1]) # processing only trimmed elements
indices = tf.where(c1)
_X = tf.gather(_X, indices)
c2 = (_X[:,2] <=  xmax[2]) # idem
indices = tf.where(c2)
return indices
...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices)  # fails: indices refer to a trimmed subset, not X

当然，我可以通过简单地扩展X来"解决"这个问题，这样每个元素也会在原始列表中包含其自身的索引，然后像以前一样进行。

所以我的第二个问题是关于功能：

tf是否有方法使GPU/tensor基础设施提供似乎没有在这上面花费记忆/时间的记账简单的问题？

当大于minimum且小于maximum的所有索引与X具有相同数量的特征时，这将返回所有索引

import tensorflow as tf
minimum = tf.random.uniform((1, 5), 0., 0.5)
maximum = tf.random.uniform((1, 5), 0.5, 1.)
x = tf.random.uniform((10, 5))
indices = tf.where(
tf.logical_and(
tf.greater(x, minimum),
tf.less(x, maximum)
)
)

<tf.Tensor: shape=(22, 2), dtype=int64, numpy=
array([[0, 3],
[0, 4],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[3, 1],
[3, 3],
[3, 4],
[4, 0],
[4, 4],
[5, 3],
[6, 2],
[6, 3],
[7, 1],
[7, 4],
[8, 2],
[8, 3],
[8, 4],
[9, 1],
[9, 3],
[9, 4]], dtype=int64)>

相关内容

最新更新

热门标签：