我有这个数据框:输入:
df= pd.DataFrame({'Date':['2007-01-01 07:14:00','2007-01-01 07:25:00','2007-01-01 08:00:00','2007-01-01 09:14:00','2007-01-01 09:33:12'],'sent':[0.32,0.34,0.45,0.7,0.22],'var1':[44,0.66,0.45,45,0.92], 'var2':['nan','2007-01-01 08:00:00','2007-01-01 09:00:00','nan','2007-01-01 10:00:00']})
输出:
Date sent var1 var2 0 2007-01-01 07:14:00 0.32 44.00 nan 1 2007-01-01 07:25:00 0.34 0.66 2007-01-01 08:00:00 2 2007-01-01 08:00:00 0.45 0.45 2007-01-01 09:00:00 3 2007-01-01 09:14:00 0.70 45.00 nan 4 2007-01-01 09:33:12 0.22 0.92 2007-01-01 10:00:00
我应用此代码每小时计算sent
列的总和 输入为:
df=df.groupby(pd.Grouper(key='Date', freq='1H')).sent.sum().reset_index()
外:
Date sent 0 2007-01-01 07:00:00 0.66 1 2007-01-01 08:00:00 0.45 2 2007-01-01 09:00:00 0.92
- 问题是:我想保留其他列,如
var1
和var2
groupby
函数。我将不胜感激的帮助。
如果我理解正确,您可以使用 transform:
import pandas as pd
df = pd.DataFrame({'Date': pd.to_datetime(['2007-01-01 07:14:00', '2007-01-01 07:25:00', '2007-01-01 08:00:00', '2007-01-01 09:14:00',
'2007-01-01 09:33:12']),
'sent': [0.32, 0.34, 0.45, 0.7, 0.22],
'var1': [44, 0.66, 0.45, 45, 0.92],
'var2': ['nan', '2007-01-01 08:00:00', '2007-01-01 09:00:00', 'nan', '2007-01-01 10:00:00']})
df['sum'] = df.groupby(df.Date.dt.hour).sent.transform('sum')
print(df)
输出
Date sent var1 var2 sum
0 2007-01-01 07:14:00 0.32 44.00 nan 0.66
1 2007-01-01 07:25:00 0.34 0.66 2007-01-01 08:00:00 0.66
2 2007-01-01 08:00:00 0.45 0.45 2007-01-01 09:00:00 0.45
3 2007-01-01 09:14:00 0.70 45.00 nan 0.92
4 2007-01-01 09:33:12 0.22 0.92 2007-01-01 10:00:00 0.92