我正在将用Pandas编写的代码转换为考拉,但我在使用numpy时遇到了错误,其中:
import pandas as pd
import numpy as np
import databricks.koalas as ks
data = {'credit': [123.23, 23423.56, 0, 0], 'debit': [0, 0, 234.21, 95.32]}
df = ks.DataFrame(data)
df['flag'] = np.where(
df['credit'] != '',
'C',
np.where(
df['debit'] != '',
'D',
''
)
)
返回错误:
PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
如果我试图用to_numpy()
或toPandas()
转换Koalas数据帧以保持代码原样,我就会耗尽内存。这段代码中有很多嵌套的np.where((语句,以及我非常不想重写的numpy的许多其他用途。
我不清楚是否有一种简单的方法可以使用考拉数据帧将这些np.where()
(或任何其他numpy语句(保留在代码中。
我知道有一种方法可以使用df.assign(flag=())
来模拟np.where()
,但我不清楚如何使用该方法来模拟嵌套条件。我在下面的尝试:
# works but does not include the second condition
df = df.assign(flag= df.debit.apply(lambda x: "D" if x != "" else "")
# Does not work and returns an error
df = test_df.assign(flag= df.debit.apply(
lambda x: "D" if x != "" else (
df.credit.apply(
lambda x: "C" if x != "" else ""))))
错误:PicklingError: Could not serialize object: TypeError: can't pickle _thread.RLock objects
def function1(ss:ks.Series):
if ss.credit!= 0:
return 'C'
elif ss.debit!= 0:
return 'D'
else:
return ''
df.apply(function1,axis=1)
out:
0 C
1 C
2 D
3 D
dtype: object