我刚刚用cudf(rapidsai(加载了csv文件,以减少所需的时间。当我尝试搜索条件为df['X'] = A
的索引时,会出现一个问题。
这是我的代码示例:
import cudf, io, requests
df = cudf.read_csv('fileA.csv')
# X is an existing column
# A is the value
df['X'] = np.where(df['X'] == A, 1, 0)
# What it is supposed to do with pandas is it search the index where df['X'] is equal to value A,
# and change them to 1, otherwise leave them as 0.
然而,错误显示如下:
if len(cond) ! = len(self):
raise ValueError("""Array conditional must be same shape as self""")
input_col = self._data[self.name]
ValueError : Array conditional must be same shape as self
我不明白为什么会发生这种事,因为我以前从未和熊猫有过任何问题。
cuDF正试图通过数组函数协议从numpy.where
调度到cupy.where
。由于这样或那样的原因,在这种情况下,cuDF无法成功运行调度函数。
一般来说,建议在这里明确使用CuPy而不是numpy。
import cudf
import cupy as cp
A = 2
df = cudf.DataFrame({"X": [0, 1, 2]})
df['X'] = cp.where(df['X'] == A, 1, 0)
df
X
0 0
1 0
2 1