我想规范化我的数据并计算pearson相关性。如果我不进行规格化,它可以工作。通过规范化,我得到这个错误消息:AttributeError: 'numpy。narray对象没有属性corr我该怎么做才能解决这个问题?
import numpy as np
import pandas as pd
filename_train = 'C:Usersxxx.xxxworkspaceDataset!train_data.csv'
names = ['a', 'b', 'c', 'd', 'e', ...]
df_train = pd.read_csv(filename_train, names=names)
from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)
#pearson correlation
pd.set_option('display.width', 100)
pd.set_option('precision', 2)
print(normalizeddf_train.corr(method='pearson'))
您需要DataFrame
构造函数,因为fit_transform
的输出是numpy array
,并且与DataFrame.corr
一起工作:
df_train = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df_train)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)
print (normalizeddf_train)
[[ 0.08421519 0.33686077 0.58950634 0.08421519 0.42107596 0.58950634]
[ 0.1774713 0.44367825 0.70988521 0.26620695 0.26620695 0.3549426 ]
[ 0.21428571 0.42857143 0.64285714 0.35714286 0.42857143 0.21428571]]
print(pd.DataFrame(normalizeddf_train).corr(method='pearson'))
0 1 2 3 4 5
0 1.000000 0.917454 0.646946 0.998477 -0.203152 -0.994805
1 0.917454 1.000000 0.896913 0.894111 -0.575930 -0.872187
2 0.646946 0.896913 1.000000 0.603899 -0.878063 -0.565959
3 0.998477 0.894111 0.603899 1.000000 -0.148832 -0.998906
4 -0.203152 -0.575930 -0.878063 -0.148832 1.000000 0.102420
5 -0.994805 -0.872187 -0.565959 -0.998906 0.102420 1.000000