我有这样的数据:
is_severe encoding sn_id
6 1 1 chr1 17689
7 0 2 chr1 17689
8 1 1 chr1 17689
9 1 2 chr1 69511
10 1 2 chr1 69511
11 1 1 chr1 69511
12 0 1 chr1 69511
我对每个";组";基于sn_id列的值的。
这是统计测试的函数:
catt <-
function(y, x, score = c(0, 1, 2)) {
miss <- unique(c(which(is.na(y)), which(is.na(x))))
n.miss <- length(miss)
if(n.miss > 0) {
y <- y[-miss]
x <- x[-miss]
}
if(!all((y == 0) | (y == 1)))
stop("y should be only 0 or 1.")
if(!all((x == 0) | (x == 1) |(x == 2)))
stop("x should be only 0, 1 or 2.")
ca <- x [y == 1]
co <- x [y == 0]
htca <- table(ca)
htco <- table(co)
A <- matrix(0, 2, 3)
colnames(A) <- c(0, 1, 2)
rownames(A) <- c(0, 1)
A[1, names(htca)] <- htca
A[2, names(htco)] <- htco
ptt <- prop.trend.test(A[1, ], colSums(A), score = score)
p.value = as.numeric(ptt$p.value)
res=p.value
return(res)}
并且我使用by函数在snp_id的组上执行它:
send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt(merged_df_normal$is_sever_int,merged_df_normal$encoding)})
并得到了这些结果,例如:
merged_df_normal$snp_id: chr11441806
[1] 0.6274769
---------------------------------------------------------------------
merged_df_normal$snp_id: chr1144192891
[1] NA
我想把它转换成一个数据帧,它看起来像这样:
snp_id pvalue
chr11441806 0.6274769
chr1144192891 NA
我试过这个:
do.call(rbind,list(send)
它返回了一个矩阵看起来是这样的:
chr11441806 chr1144192891
0.6274769 NA
在接受答案后,我不得不编辑函数:
catt_2 <-
function(y, x, score = c(0, 1, 2)) {
miss <- unique(c(which(is.na(y)), which(is.na(x))))
n.miss <- length(miss)
if(n.miss > 0) {
y <- y[-miss]
x <- x[-miss]
}
if(!all((y == 0) | (y == 1)))
stop("y should be only 0 or 1.")
if(!all((x == 0) | (x == 1) |(x == 2)))
stop("x should be only 0, 1 or 2.")
ca <- x [y == 1]
co <- x [y == 0]
htca <- table(ca)
htco <- table(co)
A <- matrix(0, 2, 3)
colnames(A) <- c(0, 1, 2)
rownames(A) <- c(0, 1)
A[1, names(htca)] <- htca
A[2, names(htco)] <- htco
ptt <- prop.trend.test(A[1, ], colSums(A), score = score)
res <- list(
chisq = as.numeric(ptt$statistic),
p.value = as.numeric(ptt$p.value)
)
return(res)
}
现在的结果是:
send=by(merged_df_normal,merged_df_normal$snp_id, function (merged_df_normal) {catt_2(merged_df_normal$is_sever,merged_df_normal$encoding)})
merged_df_normal$snp_id: chr11007252
$chisq
[1] NA
$p.value
[1] NA
------------------------------------------------------------------------
merged_df_normal$snp_id: chr1100731820
$chisq
[1] 0.9111779
$p.value
[1] 0.3398021
我希望它是:
snp_id pvalue chisq
chr11441806 0.6274769 0.9111779
chr1144192891 NA NA
答案:
library(data.table)
setDT(merged_df_normal)
merged_df_normal[,.(p.value=catt(is_sever,encoding)),snp_id]
只得到p值非常有效,但有没有办法编辑上面的答案并添加一个新的列chisq?感谢您对上次回答的帮助
我相信您可以将catt()
应用于每组sn_id
。假设您的原始数据称为df
。然后,您可以执行以下操作:
library(data.table)
setDT(df)
df[,.(p.value=catt(is_severe,encoding)),sn_id]
您需要调整您的函数,以便它处理没有足够数据的sn_id
组;在您的示例数据帧中,catt()
仅在sn_id == chr1 69511
上运行而没有错误。。
然而,通常情况下,输出看起来是这样的,每个sn_id
值在帧中有一行
sn_id p.value
<char> <num>
1: chr1 69511 0.2482131