下面是我的数据示例:
Date Count
11.01.2019 1
01.02.2019 7
25.01.2019 4
23.01.2019 4
16.03.2019 1
04.02.2019 5
06.04.2019 1
04.04.2019 5
所需输出:
Month Total_Count
Jan 9
Feb 12
Mar 1
Apr 6
我使用了下面的代码,用于上面的总结操作,它运行良好,但月份都是混乱的,没有像1月、2月那样进行相应的排序
(df.groupby(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name()
.str[:3])['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
想法是将列转换为日期时间,然后使用sort=False
进行排序和分组,以避免groupby
:中的默认排序
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y')
df1 = (df.sort_values('Date')
.groupby(df['Date'].dt.month_name().str[:3], sort=False)['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
print (df1)
Month Total_Count
0 Jan 9
1 Feb 12
2 Mar 1
3 Apr 6
另一个想法,谢谢你,安基是使用订购的Categorical
s,然后有必要删除sort=False
:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.groupby(pd.Categorical(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name().str[:3],ordered=True,categories=months))['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
或使用Series.reindex
:
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df1 = (df.groupby(pd.to_datetime(df['Date'], format='%d.%m.%Y')
.dt.month_name().str[:3])['Count']
.sum()
.rename_axis('Month')
.reindex(months, fill_value=0)
.reset_index(name='Total_Count'))
print (df1)
Month Total_Count
0 Jan 9
1 Feb 12
2 Mar 1
3 Apr 6
4 May 0
5 Jun 0
6 Jul 0
7 Aug 0
8 Sep 0
9 Oct 0
10 Nov 0
11 Dec 0
试试这个:
new_df = (df.sort_values('Date')
.groupby(df['Date'].dt.month_name().str[:3], sort=False)['Count']
.sum()
.rename_axis('Month')
.reset_index(name='Total_Count'))
print(new_df)