在Python中以交叉表的形式表示从多列之间的卡方测试获得的p值



我的数据帧中有10个特性。我应用了卡方检验,并为数据帧中的所有列对生成了p值。我想把p值表示为多个特征的交叉网格。

示例:A、B、C是我的特征,p值介于(A、B(=0.0001、(A、C(=0.5、(B、C(=0.0 之间

所以,我想把这个东西看作:

A      B       C
A   1      0.001   0.5
B   0.001  1       0.0
C   0.5    0.0     1

如果需要任何其他细节,请告知。

假设特性列表为features = ['A','B','C',...],p值为
p_values = {('A','B'):0.0001,('A','C'):0.5,...}

import pandas as pd
p_values = {('A','B'):0.0001,('A','C'):0.5}
features = ['A','B','C']
df = pd.DataFrame(columns=features)
for row in features:
rowdf = [] # prepare a row for df
for col in features:
if row == col:
rowdf.append(1) # (A,A) taken as 1
continue
try:
rowdf.append(p_values[(row,col)]) # add the value from dictionary
except KeyError:
try:
rowdf.append(p_values[(col, row)]) # look for pair like (B,A) if (A,B) not found
except KeyError: # still not found, append None
rowdf.append(None)
df.loc[len(df)] = rowdf # write row in df

df.index = features # to make row names as A,B,C ...
print(df)

最新更新