我正在查询一个数据库并填充一个pandas数据框。我正在努力聚合数据(通过groupby),然后操作数据框架索引,使表中的日期成为索引。下面是一个示例,显示了分组前后的数据,以及我最终要寻找的内容。
dataframe填充的数据
firm | dates | received | Sent
-----------------------------------------
A 10/08/2016 2 8
A 12/08/2016 4 2
B 10/08/2016 1 0
B 11/08/2016 3 5
A 13/08/2016 5 1
C 14/08/2016 7 3
B 14/08/2016 2 5
首先我要按"公司"、"日期"one_answers"收到/发送"分组。
然后操作DataFrame,使日期成为索引,而不是行索引。
最后为每天添加一个总列
一些公司在某些天没有"活动",或者至少在接收或发送中没有活动。然而,当我想要查看过去X天的视图时,空值是不可能的,而我需要填充一个零作为值。
dates | 10/08/2016 | 11/08/2016| 12/08/2016| 13/08/2016| 14/08/2016 firm | ---------------------------------------------------------------------- A received 2 0 4 5 0 sent 8 0 2 1 0 B received 1 3 1 0 2 sent 0 5 0 0 5 C received 0 0 2 0 1 sent 0 0 1 2 0 Totals r. 3 3 7 5 3 Totals s. 8 0 3 3 5
我试过下面的代码:
df = > mysql query result
n_received = df.groupby(["firm", "dates"
]).received.size()
n_sent = df.groupby(["firm", "dates"
]).sent.size()
tables = pd.DataFrame({ 'received': n_received, 'sent': n_sent,
},
columns=['received','sent'])
this = pd.melt(tables,
id_vars=['dates',
'firm',
'received', 'sent']
this = this.set_index(['dates',
'firm',
'received', 'sent'
'var'
])
this = this.unstack('dates').fillna(0)
this.columns = this.columns.droplevel()
this.columns.name = ''
this = this.transpose()
基本上,基于这段代码,我没有得到我想要的结果。我怎样才能做到这一点?从概念上讲,是否有更好的方法来达到这个结果?比如SQL语句中的聚合,或者从优化的角度和逻辑上看,Pandas中的聚合更有意义。
您可以使用stack
(unstack
)将数据从长格式转换为宽格式:
import pandas as pd
# calculate the total received and sent grouped by dates
df1 = df.drop('firm', axis = 1).groupby('dates').sum().reset_index()
# add total category as the firm column
df1['firm'] = 'total'
# concatenate the summary data frame and original data frame use stack and unstack to
# transform the data frame so that dates appear as columns while received and sent stack as column.
pd.concat([df, df1]).set_index(['firm', 'dates']).stack().unstack(level = 1).fillna(0)
# dates 10/08/2016 11/08/2016 12/08/2016 13/08/2016 14/08/2016
# firm
# A Sent 8.0 0.0 2.0 1.0 0.0
# received 2.0 0.0 4.0 5.0 0.0
# B Sent 0.0 5.0 0.0 0.0 5.0
# received 1.0 3.0 0.0 0.0 2.0
# C Sent 0.0 0.0 0.0 0.0 3.0
# received 0.0 0.0 0.0 0.0 7.0
# total Sent 8.0 5.0 2.0 1.0 8.0
# received 3.0 3.0 4.0 5.0 9.0