我有一个像这样的数据帧,它有不同的日期、频道、产品和评级。
Date Channel Product Rating
2015-01 Ch1 Pr1 4.0
2015-04 Ch1 Pr1 5.0
2016-02 Ch1 Pr1 4.5
2016-12 Ch1 Pr1 3.4
...
2015-02 Ch1 Pr2 3.0
2015-04 Ch1 Pr2 4.0
2016-07 Ch1 Pr2 3.5
2016-11 Ch1 Pr2 4.3
...
2015-01 Ch2 Pr1 4.0
2015-04 Ch2 Pr1 5.0
2016-02 Ch2 Pr1 4.5
2016-12 Ch2 Pr1 3.4
...
2015-02 Ch2 Pr2 3.0
2015-04 Ch2 Pr2 4.0
2016-07 Ch2 Pr2 3.5
2016-11 Ch2 Pr2 4.3
...
2015-02 Chn Prm 3.0
2015-04 Chn Prm 4.0
2016-07 Chn Prm 3.5
2016-11 Chn Prm 4.3
我必须使用Rating
列上的.resample('Y').mean()
,按产品和渠道查找Rating
上的相关矩阵
所以在那之后,我希望输出如下:
Pr1:
Ch1 Ch2 ... Chn
Ch1 1.0 0.8 ... -0.3
Ch2 0.8 1.0 ... 0.6
...
Chn -0.3 0.6 ... 1.0
Pr2:
Ch1 Ch2 ... Chn
Ch1 1.0 0.4 ... 0.5
Ch2 0.4 1.0 ... -0.1
...
Chn 0.5 -0.1 ... 1.0
...
Prm:
Ch1 Ch2 ... Chn
Ch1 1.0 0.4 ... 0.5
Ch2 0.4 1.0 ... -0.1
...
Chn 0.5 -0.1 ... 1.0
假设您有这样的数据集
import io
import pandas as pd
tsv = io.StringIO("""Date Channel Product Rating
2015-01 Ch1 Pr1 4.0
2015-04 Ch1 Pr1 5.0
2016-02 Ch1 Pr1 4.5
2016-12 Ch1 Pr1 3.4
2015-02 Ch1 Pr2 3.0
2015-04 Ch1 Pr2 4.0
2016-07 Ch1 Pr2 3.5
2016-11 Ch1 Pr2 4.3
2015-01 Ch2 Pr1 4.0
2015-04 Ch2 Pr1 5.0
2016-02 Ch2 Pr1 4.5
2016-12 Ch2 Pr1 3.4
2015-02 Ch2 Pr2 3.0
2015-04 Ch2 Pr2 4.0
2016-07 Ch2 Pr2 3.5
2016-11 Ch2 Pr2 4.3
2015-02 Chn Prm 3.0
2015-04 Chn Prm 4.0
2016-07 Chn Prm 3.5
2016-11 Chn Prm 4.3""")
df = pd.read_csv(tsv, sep="s+", parse_dates=["Date"])
- 计算每个渠道产品对的年度平均值
df = df.set_index("Date").groupby(["Channel", "Product"]).resample('Y').mean(numeric_only=True)["Rating"]
这会给你类似的东西
Channel Product Date
Ch1 Pr1 2015-12-31 4.50
2016-12-31 3.95
Pr2 2015-12-31 3.50
2016-12-31 3.90
Ch2 Pr1 2015-12-31 4.50
2016-12-31 3.95
Pr2 2015-12-31 3.50
2016-12-31 3.90
Chn Prm 2015-12-31 3.50
2016-12-31 3.90
- 创建通道值列
df = df.unstack(level=0)
df
结果将是
Channel Ch1 Ch2 Chn
Product Pr1 Pr2 Prm Pr1 Pr2 Prm Pr1 Pr2 Prm
Date
2015-12-31 4.50 3.5 NaN 4.50 3.5 NaN NaN NaN 3.5
2016-12-31 3.95 3.9 NaN 3.95 3.9 NaN NaN NaN 3.9
- 最后,计算每个产品的渠道之间的相关性
df.groupby(level=0).corr()
结果:
Channel Ch1 Ch2 Chn
Product Channel
Pr1 Ch1 1.0 1.0 NaN
Ch2 1.0 1.0 NaN
Chn NaN NaN NaN
Pr2 Ch1 1.0 1.0 NaN
Ch2 1.0 1.0 NaN
Chn NaN NaN NaN
Prm Ch1 NaN NaN NaN
Ch2 NaN NaN NaN
Chn NaN NaN 1.0