我的输入数据df
是:
Action Difficulty strings characters POS NEG NEU
Field 0.635 7 59 0 0 7
Field or Catch 0.768 28 193 0 0 28
Field or Ball -0.591 108 713 6 0 101
Ball -0.717 61 382 3 0 57
Catch -0.145 89 521 1 0 88
Field 0.28 208 1214 2 3 178
Field and run 1.237 18 138 1 0 17
我对Difficulty
与剩余变量strings, characters, POS, NEG, NEU
的基于组的相关性感兴趣。分组变量为Action
。如果我只对字段组感兴趣,我可以执行filter(str_detect(Action, 'Field'))
。
我可以在难度和其余变量之间逐一进行。但是,有没有一种更快的方法可以在一个包含多个变量的命令中实现这一点?我的部分解决方案是:
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>% # Original data had multiple NA
group_by(Action) %>%
summarise_all(funs(cor))
但这导致了一个错误。
我看过的一些相关SO帖子是:这与生成相关矩阵非常相关,但并没有解决我的问题——按组查找数据帧中两列的相关系数。有助于计算不同类型的相关性,并引入了一种不同的忽略NA的方法:检查数据帧(R(中两列的相关性
如有任何帮助或指导,我们将不胜感激!
供参考,这是样品dput()
structure(list(
Action = c("Field", "Field or Catch", "Field or Ball", "Ball", "Catch", "Field", "Field and run"), Difficulty = c(0.635, 0.768, -0.591, -0.717, -0.145, 0.28, 1.237),
strings = c(7L, 28L, 108L, 61L, 89L, 208L, 18L),
characters = c(59L, 193L, 713L, 382L, 521L, 1214L, 138L),
POS = c(0L, 0L, 6L, 3L, 1L, 2L, 1L),
NEG = c(0L, 0L, 0L, 0L, 0L, 3L, 0L),
NEU = c(7L, 28L, 101L, 57L, 88L, 178L, 17L)),
class = "data.frame", row.names = c(NA,
-7L))
您可以尝试-
library(dplyr)
library(stringr)
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>% # Original data had multiple NA
group_by(Action) %>%
summarise(across(-Difficulty, ~cor(.x, Difficulty)))
如果你不想group_by
Action
-
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>%
summarise(across(-c(Difficulty, Action), ~cor(.x, Difficulty)))
# strings characters POS NEG NEU
#1 -0.557039 -0.5983826 -0.8733465 -0.1520684 -0.5899733