使用pd.groupby()和pd.Grouper()组成的日期显示更少的组



pd.groupby()中,使用pd.Grouper()和列fruit显示较少的组数,如#4所示。应该有fruit和其他dates,因为它们在#5的最终输出中。

例如,#4中有(2020-01-01 00:00:00, 'mango')组,但没有(2020-01-01 00:00:00, 'orange')组等。也许我遗漏了什么。谢谢你的帮助。

代码如下:

# Library
import pandas as pd
# Data
date = [pd.Timestamp('01/01/2020'),
pd.Timestamp('01/03/2020'),
pd.Timestamp('01/20/2020'),
pd.Timestamp('09/01/2020'),
pd.Timestamp('09/03/2020'),
pd.Timestamp('09/20/2020'),                                     
pd.Timestamp('12/01/2020'),
pd.Timestamp('12/03/2020'),
pd.Timestamp('12/20/2020')
]
df = pd.DataFrame({
'fruits': ['mango','mango','orange','orange','banana', 'mango', 'orange','banana', 'banana'],
'price': [10,12,7,9,3,1,2,11,13],
'date': date
})

# Grouper
# 1MS: month start frequency
p = pd.Grouper(freq='1MS', key='date')
print("#1-n", p, 'n')
g = df.groupby(['fruits'])
print("#2-n", g.groups, 'n')
g = df.groupby([p])
print("#3-n", g.groups, 'n')
g = df.groupby([p, 'fruits'])
print("#4-n", g.groups, 'n')
result = g.sum()
print("nn#5- result:n", result)

输出:

#1-
TimeGrouper(key='date', freq=<MonthBegin>, axis=0, sort=True, dropna=True, closed='left', label='left', how='mean', convention='e', origin='start_day') 
#2-
{'banana': [4, 7, 8], 'mango': [0, 1, 5], 'orange': [2, 3, 6]} 
#3-
{2020-01-01 00:00:00: [0, 1, 2], 2020-02-01 00:00:00: [], 2020-03-01 00:00:00: [], 2020-04-01 00:00:00: [], 2020-05-01 00:00:00: [], 2020-06-01 00:00:00: [], 2020-07-01 00:00:00: [], 2020-08-01 00:00:00: [], 2020-09-01 00:00:00: [3, 4, 5], 2020-10-01 00:00:00: [], 2020-11-01 00:00:00: [], 2020-12-01 00:00:00: [6, 7, 8]} 
#4-
{(2020-01-01 00:00:00, 'mango'): [0], (2020-09-01 00:00:00, 'mango'): [1], (2020-12-01 00:00:00, 'orange'): [2]} 

#5- result:
price
date       fruits       
2020-01-01 mango      22
orange      7
2020-09-01 banana      3
mango       1
orange      9
2020-12-01 banana     24
orange      2

你发现了一个bug,已经报告了- bug: pd。将日期时间键与另一个键结合使用的Grouper会生成错误的组键数。# 51158

相关内容

  • 没有找到相关文章