这是我数据集的一部分。
Country Australia Belgium
gdp wage gdp wage
2006-01-01 00:00:00 745,522,000,000.00 23,826.64 409,813,000,000.00 20,228.74
2007-01-01 00:00:00 851,963,000,000.00 24,616.84 471,821,000,000.00 20,486.16
2008-01-01 00:00:00 1,052,580,000,000.00 24,185.70 518,626,000,000.00 20,588.93
2009-01-01 00:00:00 926,448,000,000.00 24,496.84 484,553,000,000.00 21,284.21
2010-01-01 00:00:00 1,144,260,000,000.00 24,373.76 483,548,000,000.00 20,967.05
我想找到两国"GDP"列和"工资"列的相关性。
我试过使用,
df.corr()
但输出结果是空的。
预期输出可以是这样的:
Country Correlation
Australia 1.0
Belgium 0.98
(相关性的值不准确。显示此内容仅用于说明。
我可以运行哪些代码来实现此结果?
编辑:执行行
打印(df.columns)
导致这样的输出
MultiIndex(levels=[['Australia', 'Belgium', 'Brazil', 'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Czech Republic', 'Estonia', 'France', 'Germany', 'Greece', 'Hungary', 'Ireland', 'Israel', 'Japan', 'Korea', 'Latvia', 'Lithuania', 'Luxembourg', 'Mexico', 'Netherlands', 'New Zealand', 'Poland', 'Portugal', 'Russian Federation', 'Slovak Republic', 'Slovenia', 'Spain', 'Turkey', 'United Kingdom', 'United States'], ['gdp', 'wage']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25, 26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['Country', None])
首先将
列转换为带replace
的数字并转换为float
,然后通过DataFrame.xs
DataFrame.corrwith
选择级别以进行关联:
#if create DataFrame from file
#df = pd.read_csv(file, header=[0,1], thousands=',')
df = df.replace(',','', regex=True).astype(float)
s = df.xs('gdp', axis=1, level=1).corrwith(df.xs('wage', axis=1, level=1))
print (s)
Australia 0.325915
Belgium 0.521564
dtype: float64
最后为数据帧添加reset_index
:
df1 = s.reset_index()
df1.columns = ['Country','Correlation']
print (df1)
Country Correlation
0 Australia 0.325915
1 Belgium 0.521564
详情:
print (df.xs('gdp', axis=1, level=1))
Australia Belgium
2006-01-01 00:00:00 7.455220e+11 4.098130e+11
2007-01-01 00:00:00 8.519630e+11 4.718210e+11
2008-01-01 00:00:00 1.052580e+12 5.186260e+11
2009-01-01 00:00:00 9.264480e+11 4.845530e+11
2010-01-01 00:00:00 1.144260e+12 4.835480e+11