PandasNotImplementedError:在考拉数据帧中使用嵌套的np.where()返回错误



我正在将用Pandas编写的代码转换为考拉,但我在使用numpy时遇到了错误,其中:

import pandas as pd
import numpy as np
import databricks.koalas as ks
data = {'credit': [123.23, 23423.56, 0, 0], 'debit': [0, 0, 234.21, 95.32]}
df = ks.DataFrame(data)
df['flag'] = np.where(
df['credit'] != '',
'C',
np.where(
df['debit'] != '',
'D',
''
)
)

返回错误:

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

如果我试图用to_numpy()toPandas()转换Koalas数据帧以保持代码原样,我就会耗尽内存。这段代码中有很多嵌套的np.where((语句,以及我非常不想重写的numpy的许多其他用途。

我不清楚是否有一种简单的方法可以使用考拉数据帧将这些np.where()(或任何其他numpy语句(保留在代码中。

我知道有一种方法可以使用df.assign(flag=())来模拟np.where(),但我不清楚如何使用该方法来模拟嵌套条件。我在下面的尝试:

# works but does not include the second condition
df = df.assign(flag= df.debit.apply(lambda x: "D" if x != "" else "")
# Does not work and returns an error
df = test_df.assign(flag= df.debit.apply(
lambda x: "D" if x != "" else (
df.credit.apply(
lambda x: "C" if x != "" else ""))))

错误:PicklingError: Could not serialize object: TypeError: can't pickle _thread.RLock objects

def function1(ss:ks.Series):
if ss.credit!= 0:
return 'C'
elif ss.debit!= 0:
return 'D'
else:
return ''
df.apply(function1,axis=1)

out:

0    C
1    C
2    D
3    D
dtype: object

最新更新