熊猫操作数据框架



我正在查询一个数据库并填充一个pandas数据框。我正在努力聚合数据(通过groupby),然后操作数据框架索引,使表中的日期成为索引。下面是一个示例,显示了分组前后的数据,以及我最终要寻找的内容。

dataframe填充的数据

firm |    dates    | received | Sent
-----------------------------------------
A       10/08/2016      2         8
A       12/08/2016      4         2
B       10/08/2016      1         0
B       11/08/2016      3         5
A       13/08/2016      5         1
C       14/08/2016      7         3 
B       14/08/2016      2         5
  1. 首先我要按"公司"、"日期"one_answers"收到/发送"分组。

  2. 然后操作DataFrame,使日期成为索引,而不是行索引。

  3. 最后为每天添加一个总列

  4. 一些公司在某些天没有"活动",或者至少在接收或发送中没有活动。然而,当我想要查看过去X天的视图时,空值是不可能的,而我需要填充一个零作为值。

dates        | 10/08/2016 | 11/08/2016| 12/08/2016| 13/08/2016| 14/08/2016    
firm  |  
----------------------------------------------------------------------
A      received     2           0            4            5          0
       sent         8           0            2            1          0
B      received     1           3            1            0          2
       sent         0           5            0            0          5
C      received     0           0            2            0          1
       sent         0           0            1            2          0
Totals r.           3           3            7            5          3             
Totals s.           8           0            3            3          5

我试过下面的代码:

df = > mysql query result
n_received = df.groupby(["firm", "dates"
                                ]).received.size()
n_sent = df.groupby(["firm", "dates"
                                ]).sent.size()
tables = pd.DataFrame({ 'received': n_received, 'sent': n_sent,
                           }, 
                            columns=['received','sent'])
this = pd.melt(tables, 
                    id_vars=['dates', 
                             'firm',
                             'received', 'sent']
this = this.set_index(['dates', 
                         'firm',
                         'received', 'sent'
                    'var'
                    ])        
this = this.unstack('dates').fillna(0)     
this.columns = this.columns.droplevel()
this.columns.name = ''
this = this.transpose()
基本上,基于这段代码,我没有得到我想要的结果。我怎样才能做到这一点?从概念上讲,是否有更好的方法来达到这个结果?比如SQL语句中的聚合,或者从优化的角度和逻辑上看,Pandas中的聚合更有意义。

您可以使用stack (unstack)将数据从长格式转换为宽格式:

import pandas as pd
# calculate the total received and sent grouped by dates
df1 = df.drop('firm', axis = 1).groupby('dates').sum().reset_index()
# add total category as the firm column
df1['firm'] = 'total'
# concatenate the summary data frame and original data frame use stack and unstack to 
# transform the data frame so that dates appear as columns while received and sent stack as column.
pd.concat([df, df1]).set_index(['firm', 'dates']).stack().unstack(level = 1).fillna(0)
# dates         10/08/2016  11/08/2016  12/08/2016  13/08/2016  14/08/2016
#  firm                     
#     A     Sent       8.0         0.0         2.0         1.0         0.0
#       received       2.0         0.0         4.0         5.0         0.0
#     B     Sent       0.0         5.0         0.0         0.0         5.0
#       received       1.0         3.0         0.0         0.0         2.0
#     C     Sent       0.0         0.0         0.0         0.0         3.0
#       received       0.0         0.0         0.0         0.0         7.0
# total     Sent       8.0         5.0         2.0         1.0         8.0
#       received       3.0         3.0         4.0         5.0         9.0

相关内容

  • 没有找到相关文章

最新更新