我有以下dask_cudf.core.DataFrame
:-
import pandas as pd
import numpy as np
import dask_cudf
import cudf
data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
ddf.compute()
我想为列nor
和unif
创建第一个到第五个滞后值。然而,我以以下方式创建它们:-
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)
我可以创建第一个和第二个滞后值,但不能超过这个值。当我运行值大于2的shift
时,我会得到以下错误::-
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
175 try:
--> 176 yield
177 except Exception as e:
16 frames
cudf/_lib/copying.pyx in cudf._lib.copying.shift()
RuntimeError: parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
195 )
196 msg = msg.format(f" in `{funcname}`" if funcname else "", repr(e), tb)
--> 197 raise ValueError(msg) from e
198
199
ValueError: Metadata inference failed in `shift`.
Original error is below:
------------------------
RuntimeError('parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument')
Traceback:
---------
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
yield
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 5833, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/usr/local/lib/python3.7/site-packages/dask/utils.py", line 1021, in __call__
return getattr(__obj, self.method)(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1788, in shift
return self._shift(periods)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1793, in _shift
zip(self._column_names, data_columns), self._index
File "/usr/local/lib/python3.7/site-packages/cudf/core/dataframe.py", line 818, in _from_data
out = super()._from_data(data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 140, in _from_data
Frame.__init__(obj, data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 78, in __init__
self._data = cudf.core.column_accessor.ColumnAccessor(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 121, in __init__
data = dict(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1791, in <genexpr>
data_columns = (col.shift(offset, fill_value) for col in self._columns)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column/column.py", line 391, in shift
return libcudf.copying.shift(self, offset, fill_value)
File "cudf/_lib/copying.pyx", line 633, in cudf._lib.copying.shift
我似乎不明白为什么会发生这种事。
感谢您的最新报告;只要稍作改动就可以正常工作。不要过早地发布.compute()
。如果你需要在dask/dask_cudf中做一些事情并继续处理,请使用.persist()
import pandas as pd
import numpy as np
import dask_cudf
import cudf
data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)
ddf[[s + "_" + str(3) for s in colz]] = ddf[colz].shift(3)
ddf[[s + "_" + str(5) for s in colz]] = ddf[colz].shift(5)
ddf.compute()
输出
x nor unif nor_1 unif_1 nor_2 unif_2 nor_3 unif_3 nor_5 unif_5
0 1 3.711132 0.021615 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 -2.465054 0.081927 3.711131915 0.021614727 <NA> <NA> <NA> <NA> <NA> <NA>
2 3 1.543548 0.481731 -2.465054359 0.081927168 3.711131915 0.021614727 <NA> <NA> <NA> <NA>
3 4 8.820771 0.040135 1.543548323 0.481731194 -2.465054359 0.081927168 3.711131915 0.021614727 <NA> <NA>
4 5 0.233656 0.135811 8.82077073 0.040135259 1.543548323 0.481731194 -2.465054359 0.081927168 <NA> <NA>
5 6 2.526556 0.360873 0.23365638 0.135810979 8.82077073 0.040135259 1.543548323 0.481731194 3.711131915 0.021614727
6 7 2.799205 0.383579 2.526555817 0.360873336 0.23365638 0.135810979 8.82077073 0.040135259 -2.465054359 0.081927168
7 8 5.960305 0.362417 2.799205226 0.383579063 2.526555817 0.360873336 0.23365638 0.135810979 1.543548323 0.481731194
8 9 1.878898 0.609364 5.960304782 0.362416925 2.799205226 0.383579063 2.526555817 0.360873336 8.82077073 0.040135259
9 10 1.217635 0.041408 1.878898482 0.609364119 5.960304782 0.362416925 2.799205226 0.383579063 0.23365638 0.135810979
10 11 0.580250 0.128405 1.21763458 0.04140812 1.878898482 0.609364119 5.960304782 0.362416925 2.526555817 0.360873336
11 12 4.907322 0.708164 0.580249571 0.128405085 1.21763458 0.04140812 1.878898482 0.609364119 2.799205226 0.383579063
12 13 6.591673 0.105310 4.907321929 0.708164063 0.580249571 0.128405085 1.21763458 0.04140812 5.960304782 0.362416925
13 14 -2.974896 0.587859 6.591673409 0.105310053 4.907321929 0.708164063 0.580249571 0.128405085 1.878898482 0.609364119
14 15 2.284847 0.978458 -2.974896021 0.587858754 6.591673409 0.105310053 4.907321929 0.708164063 1.21763458 0.04140812
15 16 -5.616458 0.114736 2.28484689 0.97845785 -2.974896021 0.587858754 6.591673409 0.105310053 0.580249571 0.128405085
16 17 -3.003533 0.279865 -5.616457873 0.114736009 2.28484689 0.97845785 -2.974896021 0.587858754 4.907321929 0.708164063
17 18 0.241106 0.923462 -3.003532592 0.279864688 -5.616457873 0.114736009 2.28484689 0.97845785 6.591673409 0.105310053
18 19 -2.100202 0.613850 0.241106056 0.923462497 -3.003532592 0.279864688 -5.616457873 0.114736009 -2.974896021 0.587858754
19 20 8.364832 0.929587 -2.100201941 0.613850209 0.241106056 0.923462497 -3.003532592 0.279864688 2.28484689 0.97845785