R中分类数据的受试者之间和受试者内部比较



我有一个更像统计问题。。。我有一个这样的数据帧:

ID diagnosis   Q1   Q2   Q3   Q4
1      x       yes    A    D    B    B
2      y        no    B    D    B    A
3      z       yes    A    D    C    C
4     ad       yes <NA>    C    A    C
5   tgfg       yes    C    E <NA>    C
6   gfgh        no    C <NA>    A    C
7    asj       yes    D    A    B    D
8     gh        no    A    A    D    B
9    sdf        no    B    A    E <NA>
10 asdgz        no    D    A    B    A

这里的Q1到Q4对应于我在测试中向参与者提出的问题(在实际数据中,我有30个问题(。下面的字母代表他们选择的选项。我的问题实际上有";右";答案。但我也想分析一下,确诊组和健康组在选择特定选项方面是否存在差异,以及在我的测试中,组内的问题是否存在差异。所以,我想把它作为分类数据来分析。

我首先想为诊断组和未诊断组的每个问题做多个卡方,但它给出了一个错误:

mydf %>% 
group_by(diagnosis, Q1) %>% 
summarise(count = count(Q1)) %>% 
summarise(pvalue= chisq.test(count)$p.value) 
Error in `summarise()`:
! Problem while computing `count =
count(Q1)`.
i The error occurred in group 1: diagnosis =
"no", Q1 = "A".
Caused by error in `UseMethod()`:
! no applicable method for 'count' applied to an object of class "character"
Run `rlang::last_error()` to see where the error occurred.

很抱歉我不够清楚。。。简言之,我如何比较小组内部和小组之间对测试选项的选择?

关于组之间的差异,代码可能是:

require(tidyverse)
mydf <- tribble(
~ID, ~diagnosis, ~Q1,    ~Q2,    ~Q3,    ~Q4,
"x",       T,    "A",    "D",    "B",    "B",
"y",       F,    "B",    "D",    "B",    "A",
"z",       T,    "A",    "D",    "C",    "C",
"ad",       T,    NA,     "C",    "A",    "C",
"tgfg",       T,    "C",    "E",    NA,     "C",
"gfgh",       F,    "C",    NA,     "A",    "C",
"asj",       T,    "D",    "A",    "B",    "D",
"gh",       F,    "A",    "A",    "D",    "B",
"sdf",       F,    "B",    "A",    "E",    NA,
"asdgz",       F,    "D",    "A",    "B",    "A"
)
mydf <- mydf %>%
mutate(count=1, Q1=as.factor(Q1), Q2=as.factor(Q2), Q3=as.factor(Q3), Q4=as.factor(Q4))
for (question in colnames(data)[3:length(colnames(data))]) {
mydf %>%   
select(diagnosis, all_of(question), count) %>%
drop_na() %>%
pivot_wider(names_from=diagnosis, values_from=count, values_fn=sum, values_fill=0) %>%
select(2:3) %>%
chisq.test() %>%
.$p.value %>%
sprintf(. , fmt = '%#.5f') %>%
paste(question, . ) %>%
print()
}

关于组内差异的潜力,除了问题(Q1,Q2,…(之外,我看不到第二个分类维度,这是第一个。理论上,选项(A,B,…(可能是第二维度,但在这种情况下,例如,选项A在每个问题上都应该或多或少地意味着相同,这在临床研究或调查中是不太可能的,所以我不这么认为。

相关内容

  • 没有找到相关文章

最新更新