无法使用dask-cudf创建第三个滞后列



我有以下dask_cudf.core.DataFrame:-

import pandas as pd
import numpy as np
import dask_cudf
import cudf

data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
ddf.compute()

我想为列norunif创建第一个到第五个滞后值。然而,我以以下方式创建它们:-

colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)

我可以创建第一个和第二个滞后值,但不能超过这个值。当我运行值大于2的shift时,我会得到以下错误::-

/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
175     try:
--> 176         yield
177     except Exception as e:
16 frames
cudf/_lib/copying.pyx in cudf._lib.copying.shift()
RuntimeError: parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument
The above exception was the direct cause of the following exception:
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
195         )
196         msg = msg.format(f" in `{funcname}`" if funcname else "", repr(e), tb)
--> 197         raise ValueError(msg) from e
198
199
ValueError: Metadata inference failed in `shift`.
Original error is below:
------------------------
RuntimeError('parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument')
Traceback:
---------
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
yield
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 5833, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/usr/local/lib/python3.7/site-packages/dask/utils.py", line 1021, in __call__
return getattr(__obj, self.method)(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1788, in shift
return self._shift(periods)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1793, in _shift
zip(self._column_names, data_columns), self._index
File "/usr/local/lib/python3.7/site-packages/cudf/core/dataframe.py", line 818, in _from_data
out = super()._from_data(data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 140, in _from_data
Frame.__init__(obj, data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 78, in __init__
self._data = cudf.core.column_accessor.ColumnAccessor(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 121, in __init__
data = dict(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1791, in <genexpr>
data_columns = (col.shift(offset, fill_value) for col in self._columns)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column/column.py", line 391, in shift
return libcudf.copying.shift(self, offset, fill_value)
File "cudf/_lib/copying.pyx", line 633, in cudf._lib.copying.shift

我似乎不明白为什么会发生这种事。

感谢您的最新报告;只要稍作改动就可以正常工作。不要过早地发布.compute()。如果你需要在dask/dask_cudf中做一些事情并继续处理,请使用.persist()

import pandas as pd
import numpy as np
import dask_cudf
import cudf

data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)
ddf[[s + "_" + str(3) for s in colz]] = ddf[colz].shift(3)
ddf[[s + "_" + str(5) for s in colz]] = ddf[colz].shift(5)
ddf.compute()

输出

x   nor unif    nor_1   unif_1  nor_2   unif_2  nor_3   unif_3  nor_5   unif_5
0   1   3.711132    0.021615    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
1   2   -2.465054   0.081927    3.711131915 0.021614727 <NA>    <NA>    <NA>    <NA>    <NA>    <NA>
2   3   1.543548    0.481731    -2.465054359    0.081927168 3.711131915 0.021614727 <NA>    <NA>    <NA>    <NA>
3   4   8.820771    0.040135    1.543548323 0.481731194 -2.465054359    0.081927168 3.711131915 0.021614727 <NA>    <NA>
4   5   0.233656    0.135811    8.82077073  0.040135259 1.543548323 0.481731194 -2.465054359    0.081927168 <NA>    <NA>
5   6   2.526556    0.360873    0.23365638  0.135810979 8.82077073  0.040135259 1.543548323 0.481731194 3.711131915 0.021614727
6   7   2.799205    0.383579    2.526555817 0.360873336 0.23365638  0.135810979 8.82077073  0.040135259 -2.465054359    0.081927168
7   8   5.960305    0.362417    2.799205226 0.383579063 2.526555817 0.360873336 0.23365638  0.135810979 1.543548323 0.481731194
8   9   1.878898    0.609364    5.960304782 0.362416925 2.799205226 0.383579063 2.526555817 0.360873336 8.82077073  0.040135259
9   10  1.217635    0.041408    1.878898482 0.609364119 5.960304782 0.362416925 2.799205226 0.383579063 0.23365638  0.135810979
10  11  0.580250    0.128405    1.21763458  0.04140812  1.878898482 0.609364119 5.960304782 0.362416925 2.526555817 0.360873336
11  12  4.907322    0.708164    0.580249571 0.128405085 1.21763458  0.04140812  1.878898482 0.609364119 2.799205226 0.383579063
12  13  6.591673    0.105310    4.907321929 0.708164063 0.580249571 0.128405085 1.21763458  0.04140812  5.960304782 0.362416925
13  14  -2.974896   0.587859    6.591673409 0.105310053 4.907321929 0.708164063 0.580249571 0.128405085 1.878898482 0.609364119
14  15  2.284847    0.978458    -2.974896021    0.587858754 6.591673409 0.105310053 4.907321929 0.708164063 1.21763458  0.04140812
15  16  -5.616458   0.114736    2.28484689  0.97845785  -2.974896021    0.587858754 6.591673409 0.105310053 0.580249571 0.128405085
16  17  -3.003533   0.279865    -5.616457873    0.114736009 2.28484689  0.97845785  -2.974896021    0.587858754 4.907321929 0.708164063
17  18  0.241106    0.923462    -3.003532592    0.279864688 -5.616457873    0.114736009 2.28484689  0.97845785  6.591673409 0.105310053
18  19  -2.100202   0.613850    0.241106056 0.923462497 -3.003532592    0.279864688 -5.616457873    0.114736009 -2.974896021    0.587858754
19  20  8.364832    0.929587    -2.100201941    0.613850209 0.241106056 0.923462497 -3.003532592    0.279864688 2.28484689  0.97845785
​

最新更新