我有一个相关矩阵,它是关于某些因素对特定区域降雨的影响,现在我想写一个条件for循环,可以提取大于0.6和小于-0.7的相关性,并打印我这个消息,例如:变量MaxTemp和Temp9am是强相关的(相关系数= 0.89)。我写这段代码:
c1 = corelation.abs().unstack()
my_try=pd.DataFrame(data=c1)
for i in my_try:
value=my_try[i>0.66]
print('{values} is strongly positively correlated(correlation coefficient = {amount}'.format(amount=i , values=value))
但是返回这个错误:
TypeError Traceback (most recent call last)
<ipython-input-70-24d6b566ed34> in <module>
1 for i in my_try:
----> 2 value=my_try[i>0.66]
3 print('{values} is strongly positively correlated(correlation coefficient = {amount}'.format(amount=i , values=value))
TypeError: '>' not supported between instances of 'str' and 'float'
现在,我怎么能解决它??我将很高兴有人帮助我与这个代码
试试这个自定义函数,它可能会起作用。提取绝对值大于0.3(可改变)的相关特征(>0.3和<-0.3)
def good_correlation(df1, treshold=0.3):
cm = df1.corr() #correlation matrix
np.fill_diagonal(cm.values, 0) # set diagonal to 0
corr = [(cm.index[x], cm.columns[y], cm.iloc[x,y]) for x, y in zip(*np.where(abs(np.tril(cm)) > treshold))] # create couple (feature1, feature2, value)
for couple in corr:
feature1, feature2, value = couple
print(f'{feature1} and {feature2} are strongly correlated (treshold = {treshold}) (value = {value})')
return cm, corr
如果下一步需要,它返回cm和corr。如果您需要设置两个不同的阈值,您可以在def中添加一个参数,并在列表推导式中更改条件!如果你需要更多关于这个功能的信息,尽管问。
祝你一天/下午/晚上愉快:)