蟒蛇熊猫和斯皮尔曼罗跑得更快



我正在尝试获取两个数据帧并运行行明智的矛兵关联。我已经设置了代码,以便它运行,但这需要很长时间,我希望有一种更好/更快的方法来运行它。表格如下所示:

DF1 (50x14492(:

TYPE GENRE1 GENRE2
Name1 .0945 .0845
Name2 .9074 Nan
Name3 1     0

和 DF2 (50x14492(:

TYPE GENRE1 GENRE2
Name1 .9045 .895
Name2 .074  1
Name3 .5    .045

我想获取 df1 的相关信息。名称 1 和 df2。Name2,所以我有一个字典说{Name1:corr,Name2:corr2},所以我有以下代码。

for df1_row in df1.itertuples():
for df2_row in df2.itertuples():
if df2_row.TYPE.upper() == df1_row.TYPE.upper():
df1_val_array = list(df1_row)
df2_val_array = list(df2_row)
df1_val_array = df1_val_array[2:]
df2_val_array = df2_val_array[2:]
df1_df2_gene_corr, df1_df2_gene_p_val = scipy.stats.spearmanr(df1_val_array, df2_val_array, axis=0, nan_policy='omit')
correlation_dict[df2_row.TYPE.upper()] = df1_df2_gene_corr
# plot correlations
plt.plot(list(correlation_dict.keys()), correlation_dict.values())
plt.show()

由于数据长度的原因,有没有办法做到这一点不需要永远?

如果我理解正确,请尝试以下操作:

from scipy.stats import spearmanr as spr
df1 = df1.set_index('TYPE')
df2 = df2.set_index('TYPE')
df1, df2 = df1.align(df2)
def scorr(t):
return spr(*t, axis=0, nan_policy='omit').correlation
pd.Series([*map(scorr, zip(df1.values, df2.values))], df1.columns)

最新更新