原谅我,我对此很陌生。如果有人可以帮助或指出我提供帮助的资源,我将不胜感激:
我有一个数据表,其中包含 300 个变量的 150,000 个观测值,一些结果/症状(因变量)和一些输入(自变量)。对于每个症状,我想要描述性统计量,以及与每个输入关联的卡方检验的结果。
对于描述性统计,我设法通过制作一个称为"symptom.matrix"并使用"apply"的结果变量矩阵来做到这一点。
Desc.stats<-matrix(c(apply(symptom.matrix,2,sum),
apply(symptom.matrix,2,mean),
apply(symptom.matrix,2,function(x)
{return(sqrt((mean(x)*(1-mean(x)))/length(x)))})),
ncol=3,
dimnames=list(c(...),
c("N","prev","s.e."))); Desc.stats
为了得到卡方,我通过以下方式对单个结果和输入对使用 chisq.test,但我看不出如何将其应用于症状.matrix
result1<-(chisq.test(symptom1,input1));
print (c(result1$statistic, result1$p.value))
如何扩展它以在症状矩阵上工作?是否可以使用 chisq.test,或者我最好回到基础自己为统计数据编写一个函数?
考虑嵌套lapply
调用,以跨输入列的每个组合迭代每个症状,并返回嵌套列表。要lapply
的输入对象将是原始数据帧中所有症状列和所有输入列的拆分。
由于 OP 不提供实际数据样本,下面演示了随机数据:
set.seed(788)
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom",
"VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom")
set.seed(992)
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input",
"VisionorVoice.Input","Delusion.Input","UEAny.Input")
df <- data.frame(symptoms, inputs)
# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES
chi_sq_list <- lapply(df[grep("\.Symptom", names(df))], function(s)
lapply(df[grep("\.Input", names(df))], function(i) chisq.test(s,i)))
输出(第一个列表项)
chi_sq_list$Vision.Symptom
$Vision.Input
Pearson's Chi-squared test
data: s and i
X-squared = 241.22, df = 240, p-value = 0.4657
$Voice.Input
Pearson's Chi-squared test
data: s and i
X-squared = 247, df = 240, p-value = 0.3644
$Delofreference.Input
Pearson's Chi-squared test
data: s and i
X-squared = 289.25, df = 256, p-value = 0.07502
$Paranoia.Input
Pearson's Chi-squared test
data: s and i
X-squared = 322.11, df = 288, p-value = 0.08131
$VisionorVoice.Input
Pearson's Chi-squared test
data: s and i
X-squared = 215.22, df = 208, p-value = 0.351
$Delusion.Input
Pearson's Chi-squared test
data: s and i
X-squared = 218.47, df = 224, p-value = 0.5916
$UEAny.Input
Pearson's Chi-squared test
data: s and i
X-squared = 254.22, df = 256, p-value = 0.5196