r语言 - 使用协变量在两个数据帧之间进行一系列 t 检验



我有两个数据帧,一个是患者样本的协变量,另一个是样本的甲基化数据。我需要执行t检验来比较按性别划分的甲基化数据。

我的数据帧看起来有点像这样 - 协变量:

"patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
sample2    p2         1      caucasian
sample3    p3         1      caucasian
sample4    p4         0      caucasian
sample5    p5         0      caucasian
sample6    p6         1      caucasian

并持续到样本46

甲基化:

sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111

以此类推,适用于 80,000 个不同的探针和 46 个不同的样本。 因此,如果我想做一系列 t 检验,将前 8 个样本的甲基化数据与性别进行比较,我可以指定:t.test(t(methylation[,1:8]) ~ covariates$sex)?或者有没有办法将样本名称(样本 1、样本 2...(联系起来?(提前抱歉,我对R和统计都很陌生(

一种简单的方法是创建一个 data.framemethyl_cov_df,然后使用该公式。

下面是前 6 个样本的 t.test 示例,按sexprobe1值(根据所需样本数进行适当更改(:

# combined data frame
methyl_cov_df <- cbind(t(methylation[,1:6]),covariates)

methyl_cov_df:

probe1 probe2 probe3 probe4 patient sex ethnicity
sample1 0.1111 0.1111 0.1111 0.1111      p1   0 caucasian
sample2 0.2222 0.2222 0.2222 0.2222      p2   1 caucasian
sample3 0.3333 0.3333 0.3333 0.3333      p3   1 caucasian
sample4 0.4444 0.4444 0.4444 0.4444      p4   0 caucasian
sample5 0.5555 0.5555 0.5555 0.5555      p5   0 caucasian
sample6 0.6666 0.6666 0.6666 0.6666      p6   1 caucasian

# t.test by formula: slice the data.frame to use the number of samples: done for 6 below
t.test(formula = probe1~sex, data= methyl_cov_df[1:6,]) 

韦尔奇二样本 t 检验

data:  probe1 by sex
t = -0.19612, df = 4, p-value = 0.8541
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.5613197  0.4872530
sample estimates:
mean in group 0 mean in group 1 
0.3703333       0.4073667    

数据:

covariates <- read.table(text = '        "patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
sample2    p2         1      caucasian
sample3    p3         1      caucasian
sample4    p4         0      caucasian
sample5    p5         0      caucasian
sample6    p6         1      caucasian', header = T)
methylation <- read.table(text = "       sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111", header = T)

最新更新