减少对数据帧压缩列表的迭代



考虑以下代码,该代码使用functools.reduce连接数据帧列表:

from functools import reduce
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'B': [3, 4]})
df3 = pd.DataFrame({'C': [5, 6]})
reduce(lambda x, y: pd.concat([x, y], axis=1), [df1, df2, df3])

此代码运行良好。然而,当我尝试以下操作时,我会出现错误:

reduce(lambda x, y: pd.concat([x[0], y[0]], axis=1), zip([df1, df2, df3], [0, 1, 0]))

有人能帮我理解一下吗?

让我们了解reduce:中发生了什么

# Iteration 1: 
# x = (df1, 0); y = (df2, 1)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # okay
# Now the result of `reduce(x, y)` is a dataframe which will be used as new x for iteration 2
# Iteration 2: 
# x = some_dataframe, y = (df3, 0)
# reduce(x, y): pd.concat([x[0], y[0]], axis=1) # error
# Notice that x is not a tuple anymore but a dataframe instead.
# So calling dataframe[0] will raise an key error because there is no such column in the dataframe

如果您对reduce的实现感兴趣,这里是最小的实现:

def reduce(func, sequence):
if not sequence:
raise TypeError('Empty sequence')
result = sequence[0]
for item in sequence[1:]:
result = func(result, item)

return result

相关内容

  • 没有找到相关文章

最新更新