我试图将函数应用于数据帧中的一列,该函数返回三个项目。在某些情况下,它有效,但在其他情况下它不起作用。然后我意识到这可能是因为存在 NULL 值。所以这是我代码的简化版本:
import pandas as pd
def proc(x):
return ([x,1,2], [x+1,3,4], [x+2,5,6])
## This works fine.
df = pd.DataFrame({'a':[1,2,3]})
df['new1'],df['new2'], df['new3'] = df.a.apply(lambda x:proc(x))
## But this throws the 'too many values to unpack' error.
df2 = pd.DataFrame({'a':[1,2,3, float('nan')]})
df2['new1'],df2['new2'], df2['new3'] = df2.a.apply(lambda x:proc(x))
为什么将 float('nan') 添加到 df['a']
列会导致此错误?
-
使用
zip
打包值:def proc(x): return ([x,1,2], [x+1,3,4], [x+2,5,6]) df2 = pd.DataFrame({'a':[1,2,4, float('nan')]}) df2['new1'], df2['new2'], df2['new3'] = zip(*df2['a'].apply(proc)) a new1 new2 new3 0 1.0 [1.0, 1, 2] [2.0, 3, 4] [3.0, 5, 6] 1 2.0 [2.0, 1, 2] [3.0, 3, 4] [4.0, 5, 6] 2 4.0 [4.0, 1, 2] [5.0, 3, 4] [6.0, 5, 6] 3 NaN [nan, 1, 2] [nan, 3, 4] [nan, 5, 6]
-
使用正确数量的列表元素进行解压缩,并在 proc 中使用相同数量的元素返回:
def proc(x): return ([x,1,2], [x+1,3,4], [x+2,5,6]) df2 = pd.DataFrame({'a':[1,2,4, float('nan')]}) df2['new1'], df2['new2'], df2['new3'] = zip(*df2['a'].apply(proc)) a new1 new2 new3 new4 0 1.0 [1.0, 1, 2] [2.0, 1, 2] [4.0, 1, 2] [nan, 1, 2] 1 2.0 [2.0, 3, 4] [3.0, 3, 4] [5.0, 3, 4] [nan, 3, 4] 2 4.0 [3.0, 5, 6] [4.0, 5, 6] [6.0, 5, 6] [nan, 5, 6] 3 NaN [4.0, 7, 8] [5.0, 7, 8] [7.0, 7, 8] [nan, 7, 8]