我有一个df,看起来像这样:
period value
1 2
2 3
3 4
4 6
5 8
6 10
7 11
我需要一种方法,通过计算前三个周期的平均值来计算周期8、9、10的值。例如,P8=平均值(8,10,11(=9.6,p9=均值(10,11,1.6(=10.2,p10=均值(11,9.6,10.2(=10.3
导致以下DF:
period value
1 2
2 3
3 4
4 6
5 8
6 10
7 11
8 9.6
9 10.2
10 10.3
迭代所需的新周期序列,并继续使用DataFrame.loc
、为每个周期分配前3个值中的值,即周期和mean
newPeriods = (8,9,10)
for p in newPeriods:
rowCount = df.shape[0]
df.loc[rowCount] = [p, df.loc[rowCount-3:rowCount, 'value'].mean()]
输出:
period value
0 1.0 2.000000
1 2.0 3.000000
2 3.0 4.000000
3 4.0 6.000000
4 5.0 8.000000
5 6.0 10.000000
6 7.0 11.000000
7 8.0 9.666667
8 9.0 10.222222
9 10.0 10.296296
您可以先将period
设置为索引,然后运行for循环来计算所需的值,然后使用loc
设置为帧。循环之后,我们将period
恢复为列。为了跟踪最后3个值,我们可以使用deque
:
from collections import deque
# keep `period` aside
df = df.set_index("period")
# this will always store the last 3 values
last_three = deque(df.value.tail(3), maxlen=3)
# for 3 times, do..
for _ in range(3):
# get the mean
mean = np.mean(last_three)
# the new index to put is current last index + 1
df.loc[df.index[-1] + 1, "value"] = mean
# update the deque
last_three.append(mean)
# restore `period` to columns
df = df.reset_index()
获取
>>> df
period value
0 1 2.000000
1 2 3.000000
2 3 4.000000
3 4 6.000000
4 5 8.000000
5 6 10.000000
6 7 11.000000
7 8 9.666667
8 9 10.222222
9 10 10.296296
假设您有k作为原始数据集
period=[1,2,3,4,5,6,7]
value=[2,3,4,6,8,10,11]
k=pd.DataFrame([period,value]).T
k.columns=['period','value']
k=pd.concat([k,pd.DataFrame([[i,None] for i in range(8,11)],columns=['period','value'])])
for i in range(8,11):
k.iloc[i-1,1]=np.mean(np.array([k.iloc[i-2,1],k.iloc[i-3,1],k.iloc[i-4,1]]))