r-一个变量与多个变量的相关性



我的输入数据df是:

Action          Difficulty  strings characters  POS NEG NEU
Field           0.635       7       59          0   0   7
Field or Catch  0.768       28      193         0   0   28
Field or Ball   -0.591      108     713         6   0   101
Ball            -0.717      61      382         3   0   57
Catch           -0.145      89      521         1   0   88
Field           0.28        208     1214        2   3   178
Field and run   1.237       18      138         1   0   17

我对Difficulty与剩余变量strings, characters, POS, NEG, NEU的基于组的相关性感兴趣。分组变量为Action。如果我只对字段组感兴趣,我可以执行filter(str_detect(Action, 'Field'))

我可以在难度和其余变量之间逐一进行。但是,有没有一种更快的方法可以在一个包含多个变量的命令中实现这一点?我的部分解决方案是:

df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>%   # Original data had multiple NA
group_by(Action) %>%
summarise_all(funs(cor))

但这导致了一个错误。

我看过的一些相关SO帖子是:这与生成相关矩阵非常相关,但并没有解决我的问题——按组查找数据帧中两列的相关系数。有助于计算不同类型的相关性,并引入了一种不同的忽略NA的方法:检查数据帧(R(中两列的相关性

如有任何帮助或指导,我们将不胜感激!

供参考,这是样品dput()

structure(list(
Action = c("Field", "Field or Catch", "Field or Ball", "Ball", "Catch", "Field", "Field and run"), Difficulty = c(0.635, 0.768, -0.591, -0.717, -0.145, 0.28, 1.237), 
strings = c(7L, 28L, 108L, 61L, 89L, 208L, 18L), 
characters = c(59L, 193L, 713L, 382L, 521L, 1214L, 138L), 
POS = c(0L, 0L, 6L, 3L, 1L, 2L, 1L), 
NEG = c(0L, 0L, 0L, 0L, 0L, 3L, 0L), 
NEU = c(7L, 28L, 101L, 57L, 88L, 178L, 17L)), 
class = "data.frame", row.names = c(NA, 
-7L))

您可以尝试-

library(dplyr)
library(stringr)
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>%   # Original data had multiple NA
group_by(Action) %>%
summarise(across(-Difficulty, ~cor(.x, Difficulty)))

如果你不想group_byAction-

df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>%  
summarise(across(-c(Difficulty, Action), ~cor(.x, Difficulty)))
#    strings characters        POS        NEG        NEU
#1 -0.557039 -0.5983826 -0.8733465 -0.1520684 -0.5899733

相关内容

  • 没有找到相关文章

最新更新