创建一个新行,它是对上面的行进行计算的结果-Pandas DataFrame



我有一个df,看起来像这样:

period   value
1        2
2        3
3        4
4        6
5        8
6        10
7        11

我需要一种方法,通过计算前三个周期的平均值来计算周期8、9、10的值。例如,P8=平均值(8,10,11(=9.6,p9=均值(10,11,1.6(=10.2,p10=均值(11,9.6,10.2(=10.3

导致以下DF:

period   value
1        2
2        3
3        4
4        6
5        8
6        10
7        11
8       9.6
9      10.2
10     10.3

迭代所需的新周期序列,并继续使用DataFrame.loc、为每个周期分配前3个值中的值,即周期和mean

newPeriods = (8,9,10)
for p in newPeriods:
rowCount = df.shape[0]
df.loc[rowCount] = [p, df.loc[rowCount-3:rowCount, 'value'].mean()]

输出:

period      value
0     1.0   2.000000
1     2.0   3.000000
2     3.0   4.000000
3     4.0   6.000000
4     5.0   8.000000
5     6.0  10.000000
6     7.0  11.000000
7     8.0   9.666667
8     9.0  10.222222
9    10.0  10.296296

您可以先将period设置为索引,然后运行for循环来计算所需的值,然后使用loc设置为帧。循环之后,我们将period恢复为列。为了跟踪最后3个值,我们可以使用deque:

from collections import deque
# keep `period` aside
df = df.set_index("period")
# this will always store the last 3 values
last_three = deque(df.value.tail(3), maxlen=3)
# for 3 times, do..
for _ in range(3):
# get the mean
mean = np.mean(last_three)
# the new index to put is current last index + 1
df.loc[df.index[-1] + 1, "value"] = mean

# update the deque
last_three.append(mean)
# restore `period` to columns
df = df.reset_index()

获取

>>> df
period      value
0       1   2.000000
1       2   3.000000
2       3   4.000000
3       4   6.000000
4       5   8.000000
5       6  10.000000
6       7  11.000000
7       8   9.666667
8       9  10.222222
9      10  10.296296

假设您有k作为原始数据集

period=[1,2,3,4,5,6,7]
value=[2,3,4,6,8,10,11]
k=pd.DataFrame([period,value]).T
k.columns=['period','value']
k=pd.concat([k,pd.DataFrame([[i,None] for i in range(8,11)],columns=['period','value'])])
for i in range(8,11):
k.iloc[i-1,1]=np.mean(np.array([k.iloc[i-2,1],k.iloc[i-3,1],k.iloc[i-4,1]])) 

最新更新