Pandas - 根据前一行值计算行值，将结果更新为新行值(依此类推) - Pandas - Calculate row values based on prior row value, update the result to be the new row value (and so on) 小贝子编程网

下面是一些反映我正在使用的数据的虚拟数据。

import pandas as pd
import numpy as np
from numpy import random
random.seed(30)
# Dummy data that represents a percent change
datelist = pd.date_range(start='1983-01-01', end='1994-01-01', freq='Y')
df1 = pd.DataFrame({"P Change_1": np.random.uniform(low=-0.55528, high=0.0396181, size=(11,)),
"P Change_2": np.random.uniform(low=-0.55528, high=0.0396181, size=(11,))})
#This dataframe contains the rows we want to operate on
df2 = pd.DataFrame({
'Loc1': [None, None, None, None, None, None, None, None, None, None, 2.5415], 
'Loc2': [None, None, None, None, None, None, None, None, None, None, 3.2126],})
#Set the datetime index
df1 = df1.set_index(datelist)
df2 = df2.set_index(datelist)

DF1：

P Change_1  P Change_2
1984-12-31   -0.172080   -0.231574
1985-12-31   -0.328773   -0.247018
1986-12-31   -0.160834   -0.099079
1987-12-31   -0.457924    0.000266
1988-12-31    0.017374   -0.501916
1989-12-31   -0.349052   -0.438816
1990-12-31    0.034711    0.036164
1991-12-31   -0.415445   -0.415372
1992-12-31   -0.206852   -0.413107
1993-12-31   -0.313341   -0.181030
1994-12-31   -0.474234   -0.118058

DF2：

Loc1    Loc2
1984-12-31     NaN     NaN
1985-12-31     NaN     NaN
1986-12-31     NaN     NaN
1987-12-31     NaN     NaN
1988-12-31     NaN     NaN
1989-12-31     NaN     NaN
1990-12-31     NaN     NaN
1991-12-31     NaN     NaN
1992-12-31     NaN     NaN
1993-12-31     NaN     NaN
1994-12-31  2.5415  3.2126

数据帧详细信息：

首先，Loc1 对应于 PChange_1，Loc2 对应于 P Change_2，依此类推。首先查看 Loc1，我想用相关值填充包含 Loc1 和 Loc2 的数据帧，或者计算一个具有 Calc1 和 Calc2 列的新数据帧。

计算：

我想从 Loc1 的 1994 年值开始，并通过取 Loc1 1993 = Loc1 1994+ (Loc1 1994 * P Change_1 1993( 来计算 1993 年的新值。填写其中的值将是 2.5415 +(-0.313341 * 2.5415(，大约等于 1.74514。

这个 1.74514 值将替换 1993 年的 NaN 值，然后我想使用该计算值来获取 1992 年的值。这意味着我们现在计算 Loc1 1992 = Loc1 1993 + (Loc1 1993 * P Change_1 1992(。我想逐行执行此操作，直到它获得时间序列中的最早值。

实现此行方程的最佳方法是什么？我希望这是有道理的，任何帮助都非常感谢！

df = pd.merge(df1, df2, how='inner', right_index=True, left_index=True)   # merging dataframes on date index
df['count'] = range(len(df))    # creating a column, count for easy operation

# divides dataframe in two part, one part above the not NaN row and one below
da1 = df[df['count']<=df.dropna().iloc[0]['count']]  
da2 = df[df['count']>=df.dropna().iloc[0]['count']]

da1.sort_values(by=['count'],ascending=False, inplace=True)
g=[da1,da2]
num_col=len(df1.columns)
for w in range(len(g)):
list_of_col=[]
count = 0
list_of_col=[list() for i in range(len(g[w]))]
for item, rows in g[w].iterrows():
n=[]
if count==0:
for p in range(1,num_col+1):
n.append(rows[f'Loc{p}'])
else:
for p in range(1,num_col+1):
n.append(list_of_col[count-1][p-1]+  list_of_col[count-1][p-1]* rows[f'P Change_{p}'])
list_of_col[count].extend(n)
count+=1
tmp=[list() for i in range(num_col)]
for d_ in range(num_col):
for x_ in range(len(list_of_col)):
tmp[d_].append(list_of_col[x_][d_])
z1=[]
z1.extend(tmp)
for i in range(num_col):
g[w][f'Loc{i+1}']=z1[i]
da1.sort_values(by=['count'] ,inplace=True)
final_df = pd.concat([da1, da2[1:]])
calc_df = pd.DataFrame()
for i in range(num_col):
calc_df[f'Calc{i+1}']=final_df[f'Loc{i+1}']
print(calc_df)

我试图在评论中包含我所做的所有晦涩的事情。我已经编辑了我的代码，让初始数据帧不受影响。

[编辑]：我已经编辑了代码以在给定的数据帧中包含任意数量的列。

[编辑：]如果 df1 和 df2 中的列名是任意的，请在运行上面的代码之前运行此代码块。我已经使用列表理解重命名了列名称！

df1.columns = [f'P Change_{i+1}' for i in range(len(df1.columns))]
df2.columns = [f'Loc{i+1}' for i in range(len(df2.columns))]

[编辑] 也许有更好/更优雅的方法可以做到这一点，但这对我来说效果很好：

def fill_values(df1, df2, cols1=None, cols2=None):
if cols1 is None: cols1 = df1.columns
if cols2 is None: cols2 = df2.columns
for i in reversed(range(df2.shape[0]-1)):
for col1, col2 in zip(cols1, cols2):
if np.isnan(df2[col2].iloc[i]):
val = df2[col2].iloc[i+1] + df2[col2].iloc[i+1] * df1[col1].iloc[i] 
df2[col2].iloc[i] = val
return df1, df2
df1, df2 = fill_values(df1, df2)
print(df2)
Loc1    Loc2
1983-12-31  0.140160    0.136329
1984-12-31  0.169291    0.177413
1985-12-31  0.252212    0.235614
1986-12-31  0.300550    0.261526
1987-12-31  0.554444    0.261457
1988-12-31  0.544976    0.524925
1989-12-31  0.837202    0.935388
1990-12-31  0.809117    0.902741
1991-12-31  1.384158    1.544128
1992-12-31  1.745144    2.631024
1993-12-31  2.541500    3.212600

这假设 df1 和 df2 中的行完全对应(我不是在查询索引，而只是在查询位置(。希望对您有所帮助！

需要明确的是，你需要的是Loc1[year]=Loc1[next_year] + PChange[year]*Loc1[next_year]，对吧？下面的循环将做你正在寻找的，但它只是假设两个 df 中的行数总是相等的，等等(而不是匹配索引中的值(。从您的描述中，我认为这适用于您的数据。

for i in range(df2.shape[0]-2,-1,-1):
df2.Loc1[i]=df2.Loc1[i+1] + (df1.PChange_1[i]*df2.Loc1[i+1])

希望这对:)有所帮助

Pandas - 根据前一行值计算行值，将结果更新为新行值(依此类推)

下面是一些反映我正在使用的数据的虚拟数据。

DF1：

DF2：

数据帧详细信息：

计算：

相关内容

最新更新

热门标签：