有没有比使用`.unique()`更好的方法来编写递归`df.loc(t-1)`赋值



递归函数很难向量化,因为每个输入都在时间t取决于在时间t-1的先前输入。

[问题更新如下,示例x_t=a x_{t-1}+b稍微复杂一些。]

.loc返回不同数据类型的问题

import pandas
df1 = pandas.DataFrame({'year':range(2020,2024),'a':range(3,7)})
# Set the initial value
t0 = min(df1.year)
df1.loc[df1.year==t0, "x"] = 0

当等式的右侧是pandas.core.series.series 时,此分配不起作用

for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]
print(df1)
#    year  a    x
# 0  2020  3  0.0
# 1  2021  4  NaN
# 2  2022  5  NaN
# 3  2023  6  NaN
print(type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]))
# <class 'pandas.core.series.Series'>

当方程的右侧是一个numpy数组时,赋值有效

for t in range (min(df1.year)+1, max(df1.year)+1):
df1.loc[df1.year==t, "x"] = (df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()
#break
print(df1)
#    year  a     x
# 0  2020  3   0.0
# 1  2021  4   3.0
# 2  2022  5   7.0
# 3  2023  6  12.0
print(type((df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"]).unique()))
# <class 'numpy.ndarray'>

当.loc((选择使用年度索引时,赋值直接起作用

df2 = df.set_index("year").copy()
# Set the initial value
df2.loc[df2.index.min(), "x"] = 0
for t in range (df2.index.min()+1, df2.index.max()+1):
df2.loc[t, "x"] = df2.loc[t-1, "x"] + df2.loc[t-1,"a"]
#break
print(df2)
#       a     x
# year
# 2020  3   0.0
# 2021  4   3.0
# 2022  5   7.0
# 2023  6  12.0
print(type(df2.loc[t-1, "x"] + df2.loc[t-1,"a"]))
# <class 'numpy.float64'>
  • type(df1.loc[df1.year==t-1,"x"] + df1.loc[df1.year==t-1,"a"])是大熊猫而type(df2.loc[t-1, "x"] + df2.loc[t-1,"a"])是一个numpy浮点。为什么这些类型不同
  • 如果我不想在计算之前使用set_index()。有没有比使用.unique()更好的方法来编写递归.loc()赋值

另请参阅:

  • 递归上的相关问答任务
  • [变异用户定义函数方法](https://pandas.pydata.org/pandas-docs/stable/user_guide/gotchas.html#mutating-使用用户定义的函数-udf方法(

使用乘法和加法分量的示例

我们的实际问题更为复杂,因为存在乘法和加法分量

import pandas
df3 = pandas.DataFrame({'year':range(2020,2024),'a':range(3,7), 'b':range(8,12)})
df3 = df3.set_index("year").copy()
# Set the initial value
df3.loc[df3.index.min(), "x"] = 0
for t in range (df3.index.min()+1, df3.index.max()+1):
df3.loc[t, "x"] = df3.loc[t-1, "x"] * df3.loc[t-1, "a"] + df3.loc[t-1, "b"]
#break
print(df3)

如果我不明白,很抱歉,你想要这个吗?

df1['x']= df1['a'].cumsum().shift().fillna(0)
print(df1)

输出:

year  a     x
0  2020  3   0.0
1  2021  4   3.0
2  2022  5   7.0
3  2023  6  12.0

最新更新