考虑以下代码,该代码使用functools.reduce
连接数据帧列表:
from functools import reduce
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
df3 = pd.DataFrame({'C': [5, 6]})
reduce(lambda x, y: pd.concat([x, y], axis=1), [df1, df2, df3])
此代码运行良好。然而,当我尝试以下操作时,我会出现错误:
reduce(lambda x, y: pd.concat([x[0], y[0]], axis=1), zip([df1, df2, df3], [0, 1, 0]))
有人能帮我理解一下吗?
让我们了解reduce
:中发生了什么
# Iteration 1:
# x = (df1, 0); y = (df2, 1)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # okay
# Now the result of `reduce(x, y)` is a dataframe which will be used as new x for iteration 2
# Iteration 2:
# x = some_dataframe, y = (df3, 0)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # error
# Notice that x is not a tuple anymore but a dataframe instead.
# So calling dataframe[0] will raise an key error because there is no such column in the dataframe
如果您对reduce
的实现感兴趣,这里是最小的实现:
def reduce(func, sequence):
if not sequence:
raise TypeError('Empty sequence')
result = sequence[0]
for item in sequence[1:]:
result = func(result, item)
return result