我经常收到REDCap调查的数据,在这些调查中,受访者被允许";检查">1对调查问题的回答。每个潜在的响应都包含在自己的列中。我想总结一下检查每个响应选项(列(的频率。例如:
library(tidyverse)
set.seed(1234)
responses<-c("Checked", "Unchecked")
numobs<-10
my_example<-data.frame(id=1:10,
Response_Option_A=sample(responses, numobs, replace=TRUE),
Response_Option_B=sample(responses, numobs, replace=TRUE),
Response_Option_C=sample(responses, numobs, replace=TRUE),
Response_Option_D=sample(responses, numobs, replace=TRUE),
stringsAsFactors = FALSE)
my_example
#> id Response_Option_A Response_Option_B Response_Option_C Response_Option_D
#> 1 1 Unchecked Unchecked Unchecked Checked
#> 2 2 Unchecked Unchecked Unchecked Unchecked
#> 3 3 Unchecked Unchecked Unchecked Checked
#> 4 4 Unchecked Checked Unchecked Checked
#> 5 5 Checked Unchecked Unchecked Checked
#> 6 6 Unchecked Unchecked Unchecked Unchecked
#> 7 7 Checked Unchecked Checked Checked
#> 8 8 Checked Checked Unchecked Unchecked
#> 9 9 Checked Unchecked Unchecked Unchecked
#> 10 10 Unchecked Unchecked Unchecked Checked
我最初倾向于尝试这个,但它返回的是检查的回复总数,而不是每列中的数字。
my_example %>%
select(starts_with("Response_Option_")) %>%
summarise(checked=sum(.=="Checked"))
#> checked
#> 1 13
创建于2020-08-10由reprex包(v0.3.0(
感谢您帮助有效地总结这些回复。
这是一种tidyverse
方法,用于按列而不是按行显示响应总数。我认为,从你的问题措辞来看,这就是你想要的。还包括starts_with()
函数,该函数包含在您的问题标签中。
我们可以使用pivot_longer()
将响应特征从宽转换为长,然后使用group_by
定义变量,将现有表转换为分组表,其中summarise(
(操作用于创建新的数据帧,其中为分组变量的每个组合提供行。
library(tidyverse)
set.seed(1234)
responses<-c("Checked", "Unchecked")
numobs<-10
my_example<-data.frame(id=1:10,
Response_Option_A=sample(responses, numobs, replace=TRUE),
Response_Option_B=sample(responses, numobs, replace=TRUE),
Response_Option_C=sample(responses, numobs, replace=TRUE),
Response_Option_D=sample(responses, numobs, replace=TRUE),
stringsAsFactors = FALSE)
my_example %>%
pivot_longer(starts_with("Response_"), names_to = "Responses",
values_to = "value") %>%
group_by(Responses, value) %>%
summarise(total_responses = n())
#> # A tibble: 8 x 3
#> # Groups: Responses [4]
#> Responses value total_responses
#> <chr> <chr> <int>
#> 1 Response_Option_A Checked 4
#> 2 Response_Option_A Unchecked 6
#> 3 Response_Option_B Checked 2
#> 4 Response_Option_B Unchecked 8
#> 5 Response_Option_C Checked 1
#> 6 Response_Option_C Unchecked 9
#> 7 Response_Option_D Checked 6
#> 8 Response_Option_D Unchecked 4
创建于2020-08-10由reprex包(v0.3.0(
如果您只想要Checked
响应,可以在summarise()
操作之后添加以下代码行:
filter(value == "Checked")
#> # A tibble: 4 x 3
#> # Groups: Responses [4]
#> Responses value total_responses
#> <chr> <chr> <int>
#> 1 Response_Option_A Checked 4
#> 2 Response_Option_B Checked 2
#> 3 Response_Option_C Checked 1
#> 4 Response_Option_D Checked 6
检查tidyREDCap
包。它有一组函数来帮助处理检查所有来自REDCap的应用变量。该包在CRAN上,github.io上的网站将文章中的小插曲放在页面顶部。
您可以将summarise
与across
:一起使用
library(dplyr)
my_example %>%
summarise(across(starts_with("Response_Option_"), ~sum(. == 'Checked')))
# Response_Option_A Response_Option_B Response_Option_C Response_Option_D
#1 4 2 1 6
在旧版本的dplyr
中,您可以使用summarise_at
:
my_example %>%
summarise_at(vars(starts_with("Response_Option_")), ~sum(. == 'Checked'))
一个非常base R
的解决方案是:
my_example$checked <- apply(my_example[,which(grepl('Response_Option_',names(my_example)))],1,
function(x) length(which(x=="Checked")))
输出:
id Response_Option_A Response_Option_B Response_Option_C Response_Option_D checked
1 1 Unchecked Unchecked Unchecked Checked 1
2 2 Unchecked Unchecked Unchecked Unchecked 0
3 3 Unchecked Unchecked Unchecked Checked 1
4 4 Unchecked Checked Unchecked Checked 2
5 5 Checked Unchecked Unchecked Checked 2
6 6 Unchecked Unchecked Unchecked Unchecked 0
7 7 Checked Unchecked Checked Checked 3
8 8 Checked Checked Unchecked Unchecked 2
9 9 Checked Unchecked Unchecked Unchecked 1
10 10 Unchecked Unchecked Unchecked Checked 1
也是@r2evans:信用的最佳方式
my_example$checked <- rowSums(my_example[, grep("^Response_", colnames(my_example))] == "Checked")
它产生了相同的先前输出,并且可读性更强。