我有一个更像统计问题。。。我有一个这样的数据帧:
ID diagnosis Q1 Q2 Q3 Q4
1 x yes A D B B
2 y no B D B A
3 z yes A D C C
4 ad yes <NA> C A C
5 tgfg yes C E <NA> C
6 gfgh no C <NA> A C
7 asj yes D A B D
8 gh no A A D B
9 sdf no B A E <NA>
10 asdgz no D A B A
这里的Q1到Q4对应于我在测试中向参与者提出的问题(在实际数据中,我有30个问题(。下面的字母代表他们选择的选项。我的问题实际上有";右";答案。但我也想分析一下,确诊组和健康组在选择特定选项方面是否存在差异,以及在我的测试中,组内的问题是否存在差异。所以,我想把它作为分类数据来分析。
我首先想为诊断组和未诊断组的每个问题做多个卡方,但它给出了一个错误:
mydf %>%
group_by(diagnosis, Q1) %>%
summarise(count = count(Q1)) %>%
summarise(pvalue= chisq.test(count)$p.value)
Error in `summarise()`:
! Problem while computing `count =
count(Q1)`.
i The error occurred in group 1: diagnosis =
"no", Q1 = "A".
Caused by error in `UseMethod()`:
! no applicable method for 'count' applied to an object of class "character"
Run `rlang::last_error()` to see where the error occurred.
很抱歉我不够清楚。。。简言之,我如何比较小组内部和小组之间对测试选项的选择?
关于组之间的差异,代码可能是:
require(tidyverse)
mydf <- tribble(
~ID, ~diagnosis, ~Q1, ~Q2, ~Q3, ~Q4,
"x", T, "A", "D", "B", "B",
"y", F, "B", "D", "B", "A",
"z", T, "A", "D", "C", "C",
"ad", T, NA, "C", "A", "C",
"tgfg", T, "C", "E", NA, "C",
"gfgh", F, "C", NA, "A", "C",
"asj", T, "D", "A", "B", "D",
"gh", F, "A", "A", "D", "B",
"sdf", F, "B", "A", "E", NA,
"asdgz", F, "D", "A", "B", "A"
)
mydf <- mydf %>%
mutate(count=1, Q1=as.factor(Q1), Q2=as.factor(Q2), Q3=as.factor(Q3), Q4=as.factor(Q4))
for (question in colnames(data)[3:length(colnames(data))]) {
mydf %>%
select(diagnosis, all_of(question), count) %>%
drop_na() %>%
pivot_wider(names_from=diagnosis, values_from=count, values_fn=sum, values_fill=0) %>%
select(2:3) %>%
chisq.test() %>%
.$p.value %>%
sprintf(. , fmt = '%#.5f') %>%
paste(question, . ) %>%
print()
}
关于组内差异的潜力,除了问题(Q1,Q2,…(之外,我看不到第二个分类维度,这是第一个。理论上,选项(A,B,…(可能是第二维度,但在这种情况下,例如,选项A在每个问题上都应该或多或少地意味着相同,这在临床研究或调查中是不太可能的,所以我不这么认为。