我已经搜索了熊猫文档,不幸的是,我找不到答案。
实际上,经过一些数据争用之后,我得到了数据框架
ticker_id close_date sector sector_index
0 1 2014-02-28 00:00:00 Consumer Goods 31.106653
1 1 2014-02-27 00:00:00 Consumer Goods 30.951213
2 2 2014-02-28 00:00:00 Consumer Goods 19.846387
3 2 2014-02-27 00:00:00 Consumer Goods 19.671747
4 3 2014-02-28 00:00:00 Consumer Goods 1208.552000
5 3 2014-02-27 00:00:00 Consumer Goods 1193.352000
6 4 2014-02-28 00:00:00 Consumer Goods 9.893989
7 4 2014-02-27 00:00:00 Consumer Goods 9.857385
8 5 2014-02-28 00:00:00 Consumer Goods 52.196757
9 5 2014-02-27 00:00:00 Consumer Goods 53.101520
10 6 2014-02-28 00:00:00 Services 5.449554
11 6 2014-02-27 00:00:00 Services 5.440019
12 7 2014-02-28 00:00:00 Basic Materials 4149.237000
13 7 2014-02-27 00:00:00 Basic Materials 4130.704000
And I run groupby
df_all2 = df_all.groupby(['close_date','sector']).sum()
print df_all2
结果是这样的
ticker_id sector_index
close_date sector
2014-02-27 Basic Materials 7 4130.704000
Consumer Goods 15 1306.933865
Services 6 5.440019
2014-02-28 Basic Materials 7 4149.237000
Consumer Goods 15 1321.595786
Services 6 5.449554
但是在这种形式下,我不能正确地上传到mysql。因此,为了正确上传到mysql,我需要做这个和其他一些事情。
data2 = list(tuple(x) for x in df_all2.values)
但是data2有无意义的垃圾。
长话短说,我如何让groupby给我以下结果(其中close_date全部正确填写并且列标题是表格)。
close_date sector ticker_id sector_index
2014-02-27 Basic Materials 7 4130.704000
2014-02-27 Consumer Goods 15 1306.933865
2014-02-27 Services 6 5.440019
2014-02-28 Basic Materials 7 4149.237000
2014-02-28 Consumer Goods 15 1321.595786
2014-02-28 Services 6 5.449554
另外,为了帮助社区,我应该如何修改标题,以便其他遇到这个问题的熊猫用户也能找到你的解决方案?非常感谢您的帮助。
在使用to_sql
*:
reset_index
In [11]: df.groupby(['close_date','sector']).sum().reset_index()
Out[11]:
close_date sector ticker_id sector_index
0 2014-02-27 Basic Materials 7 4130.704000
1 2014-02-27 Consumer Goods 15 1306.933865
2 2014-02-27 Services 6 5.440019
3 2014-02-28 Basic Materials 7 4149.237000
4 2014-02-28 Consumer Goods 15 1321.595786
5 2014-02-28 Services 6 5.449554
也可以在groupby中使用as_index=False:
In [12]: df.groupby(['close_date','sector'], as_index=False).sum()
Out[12]:
close_date sector ticker_id sector_index
0 2014-02-27 Basic Materials 7 4130.704000
1 2014-02-27 Consumer Goods 15 1306.933865
2 2014-02-27 Services 6 5.440019
3 2014-02-28 Basic Materials 7 4149.237000
4 2014-02-28 Consumer Goods 15 1321.595786
5 2014-02-28 Services 6 5.449554
*注意:这应该从0.14起修复,即您应该能够将MultiIndex保存到sql.