对 R 中的许多列执行自动化功能



原谅我,我对此很陌生。如果有人可以帮助或指出我提供帮助的资源,我将不胜感激:

我有一个数据表,其中包含 300 个变量的 150,000 个观测值,一些结果/症状(因变量)和一些输入(自变量)。对于每个症状,我想要描述性统计量,以及与每个输入关联的卡方检验的结果。

对于描述性统计,我设法通过制作一个称为"symptom.matrix"并使用"apply"的结果变量矩阵来做到这一点。

Desc.stats<-matrix(c(apply(symptom.matrix,2,sum),
                     apply(symptom.matrix,2,mean),
                     apply(symptom.matrix,2,function(x)
                           {return(sqrt((mean(x)*(1-mean(x)))/length(x)))})),
                  ncol=3,                                 
                  dimnames=list(c(...),
                  c("N","prev","s.e."))); Desc.stats

为了得到卡方,我通过以下方式对单个结果和输入对使用 chisq.test,但我看不出如何将其应用于症状.matrix

 result1<-(chisq.test(symptom1,input1));
print (c(result1$statistic, result1$p.value))

如何扩展它以在症状矩阵上工作?是否可以使用 chisq.test,或者我最好回到基础自己为统计数据编写一个函数?

考虑嵌套lapply调用,以跨输入列的每个组合迭代每个症状,并返回嵌套列表。要lapply的输入对象将是原始数据帧中所有症状列和所有输入列的拆分。

由于 OP 不提供实际数据样本,下面演示了随机数据:

set.seed(788)
symptoms <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(symptoms) <- c("Vision.Symptom","Voice.Symptom","Delofreference.Symptom","Paranoia.Symptom", 
                        "VisionorVoice.Symptom","Delusion.Symptom","UEAny.Symptom")
set.seed(992)
inputs <- sapply(1:7, function(i,s) LETTERS[sample(26, 26, replace=TRUE)[s]], 1:26)
colnames(inputs) <- c("Vision.Input","Voice.Input","Delofreference.Input","Paranoia.Input", 
                      "VisionorVoice.Input","Delusion.Input","UEAny.Input")
df <- data.frame(symptoms, inputs)
# LIST OF 7 ITEMS, EACH NESTED WITH THE 7 INPUTS
# CHANGE grep() to c() OF ACTUAL COLUMN NAMES
chi_sq_list <- lapply(df[grep("\.Symptom", names(df))], function(s)
                      lapply(df[grep("\.Input", names(df))], function(i) chisq.test(s,i)))

输出(第一个列表项)

chi_sq_list$Vision.Symptom
$Vision.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 241.22, df = 240, p-value = 0.4657

$Voice.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 247, df = 240, p-value = 0.3644

$Delofreference.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 289.25, df = 256, p-value = 0.07502

$Paranoia.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 322.11, df = 288, p-value = 0.08131

$VisionorVoice.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 215.22, df = 208, p-value = 0.351

$Delusion.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 218.47, df = 224, p-value = 0.5916

$UEAny.Input
    Pearson's Chi-squared test
data:  s and i
X-squared = 254.22, df = 256, p-value = 0.5196

相关内容

最新更新