在pd.groupby()
中,使用pd.Grouper()
和列fruit
显示较少的组数,如#4
所示。应该有fruit
和其他dates
,因为它们在#5
的最终输出中。
例如,#4
中有(2020-01-01 00:00:00, 'mango')
组,但没有(2020-01-01 00:00:00, 'orange')
组等。也许我遗漏了什么。谢谢你的帮助。
代码如下:
# Library
import pandas as pd
# Data
date = [pd.Timestamp('01/01/2020'),
pd.Timestamp('01/03/2020'),
pd.Timestamp('01/20/2020'),
pd.Timestamp('09/01/2020'),
pd.Timestamp('09/03/2020'),
pd.Timestamp('09/20/2020'),
pd.Timestamp('12/01/2020'),
pd.Timestamp('12/03/2020'),
pd.Timestamp('12/20/2020')
]
df = pd.DataFrame({
'fruits': ['mango','mango','orange','orange','banana', 'mango', 'orange','banana', 'banana'],
'price': [10,12,7,9,3,1,2,11,13],
'date': date
})
# Grouper
# 1MS: month start frequency
p = pd.Grouper(freq='1MS', key='date')
print("#1-n", p, 'n')
g = df.groupby(['fruits'])
print("#2-n", g.groups, 'n')
g = df.groupby([p])
print("#3-n", g.groups, 'n')
g = df.groupby([p, 'fruits'])
print("#4-n", g.groups, 'n')
result = g.sum()
print("nn#5- result:n", result)
输出:
#1-
TimeGrouper(key='date', freq=<MonthBegin>, axis=0, sort=True, dropna=True, closed='left', label='left', how='mean', convention='e', origin='start_day')
#2-
{'banana': [4, 7, 8], 'mango': [0, 1, 5], 'orange': [2, 3, 6]}
#3-
{2020-01-01 00:00:00: [0, 1, 2], 2020-02-01 00:00:00: [], 2020-03-01 00:00:00: [], 2020-04-01 00:00:00: [], 2020-05-01 00:00:00: [], 2020-06-01 00:00:00: [], 2020-07-01 00:00:00: [], 2020-08-01 00:00:00: [], 2020-09-01 00:00:00: [3, 4, 5], 2020-10-01 00:00:00: [], 2020-11-01 00:00:00: [], 2020-12-01 00:00:00: [6, 7, 8]}
#4-
{(2020-01-01 00:00:00, 'mango'): [0], (2020-09-01 00:00:00, 'mango'): [1], (2020-12-01 00:00:00, 'orange'): [2]}
#5- result:
price
date fruits
2020-01-01 mango 22
orange 7
2020-09-01 banana 3
mango 1
orange 9
2020-12-01 banana 24
orange 2
你发现了一个bug,已经报告了- bug: pd。将日期时间键与另一个键结合使用的Grouper会生成错误的组键数。# 51158