你好,我有一个具有月(int),年(int), product_id(字符串),city_id(字符串),customer_id(字符串)和销售(int)的pandas dataframe,我想首先按特定的customer_product_city_month_year组合进行分组,并获得所有组合的销售列的平均值为前3个月,6个月,1年和2年作为pandas dataframe中的新列
我将感谢任何反馈由于
准备一堆数据:
import pandas as pd
import numpy as np
# sample data
dti = pd.date_range("2017-1-1", "2020-12-1", freq="MS")
products = ["cities{}".format(c) for c in range(1, 4)]
cities = ["cities{}".format(c) for c in range(1, 3)]
customers = ["customer{}".format(c) for c in range(1, 5)]
idx = pd.MultiIndex.from_product([dti.month, dti.year, products, cities, customers],
names=["month", "year", "product_id", "city_id", "customer_id"])
df = pd.DataFrame({"sales": np.random.randint(0, 30, size=len(idx))}, index=idx).reset_index()
>>> df
month year product_id city_id customer_id sales
0 1 2017 products1 cities1 customer1 20
1 1 2017 products1 cities1 customer2 28
2 1 2017 products1 cities1 customer3 5
3 1 2017 products1 cities1 customer4 17
4 1 2017 products1 cities2 customer1 16
... ... ... ... ... ... ...
55291 12 2020 products3 cities1 customer4 6
55292 12 2020 products3 cities2 customer1 1
55293 12 2020 products3 cities2 customer2 21
55294 12 2020 products3 cities2 customer3 16
55295 12 2020 products3 cities2 customer4 3
将(年,月)转换为日期时间。使用日期操作更容易后:
df2 = df.assign(datetime=pd.to_datetime(df["year"].astype(str) + df["month"].astype(str), format="%Y%m"))
df2 = df2.drop(columns=["month", "year"]).set_index("datetime").sort_index()
cols = ["product_id", "city_id", "customer_id"]
out = []
for month in [3, 6, 12, 24]:
start = df2.index.max() - pd.tseries.offsets.DateOffset(months=month)
out.append(df2[start:].groupby(cols).mean().squeeze().rename("{}M".format(month)))
输出:
>>> pd.concat(out, axis="columns")
3M 6M 12M 24M
product_id city_id customer_id
products1 cities1 customer1 15.161458 14.779762 14.788462 14.433333
customer2 14.479167 14.580357 14.706731 14.761667
customer3 13.666667 13.738095 14.229167 14.680000
customer4 14.828125 14.875000 14.641026 14.539167
cities2 customer1 14.723958 14.211310 14.161859 14.440000
customer2 14.791667 15.083333 14.907051 14.781667
customer3 15.015625 14.273810 14.605769 14.664167
customer4 14.036458 14.505952 14.358974 14.267500
products2 cities1 customer1 13.666667 13.755952 14.166667 14.179167
customer2 13.484375 14.375000 14.025641 14.277500
customer3 13.645833 14.119048 14.307692 14.296667
customer4 13.244792 13.657738 14.112179 14.137500
cities2 customer1 13.682292 14.541667 14.434295 14.366667
customer2 13.979167 14.116071 14.304487 14.592500
customer3 16.489583 15.744048 14.828526 14.671667
customer4 14.625000 14.505952 14.099359 14.409167
products3 cities1 customer1 13.562500 14.125000 14.149038 14.221667
customer2 14.713542 14.717262 14.520833 14.060833
customer3 14.562500 14.696429 14.560897 14.596667
customer4 14.536458 14.651786 14.314103 14.075833
cities2 customer1 14.130208 14.559524 14.089744 14.365000
customer2 14.276042 14.279762 14.352564 14.273333
customer3 13.901042 14.657738 14.641026 14.254167
customer4 14.411458 14.303571 14.258013 14.265000
我得到一个错误,因为我也有一个星期变量,所以我的数据框架有点不同:
month year product_id city_id customer_id sales1 2017产品1城市1客户11 . 2017年产品1 .城市1 .客户12、2017年产品1、城市1、客户13 . 2017年产品1 .城市2 .客户24 1 2017 products1 cities3 customer3 5
我得到的错误是TypeError: Index(…)必须与某种类型的集合一起调用,'3M'已通过,任何反馈将非常感谢