我有以下数据:
Method1 100x 0.9736842 0.9736842 0.9473684 0.9473684
Method2 100x 0 0.5 0.917 0.667
Method1 50x 0.5 0.4210526 0.3421053 0.6315789
Method2 50x 0 0.417 0.750 0.883
我想做的是使用sapply
功能从相同的覆盖范围提取行(100X,50x)分组,然后形成矩阵
结果
#100x
[,1] [,2] [,3] [,4]
[1,] 0.9736842 0.9736842 0.9473684 0.9473684
[2,] 0.0000000 0.5000000 0.9170000 0.6670000
#50x
[,1] [,2] [,3] [,4]
[1,] 0.5000000 0.4210526 0.3421053 0.6315789
[2,] 0.0000000 0.4170000 0.7500000 0.8830000
我拥有的是以下代码,但没有产生结果我想要:
dat <- read.table("http://dpaste.com/1586262/plain/")
colnames(dat) <- c("Method", "Coverage", "error 0%", "error 1%", "error 2%", "error 4%")
sapply(3:6,
function(x) {
tmp <- matrix(dat[,x],nrow=2,byrow=TRUE)
print(tmp);
}
)
做什么方法?
这在逻辑上似乎是split
的好情况:
lapply(split(dat[3:6], dat$Coverage),function(x) unname(as.matrix(x)) )
#$`100x`
# [,1] [,2] [,3] [,4]
#[1,] 0.9736842 0.9736842 0.9473684 0.9473684
#[2,] 0.0000000 0.5000000 0.9170000 0.6670000
#
#$`50x`
# [,1] [,2] [,3] [,4]
#[1,] 0.5 0.4210526 0.3421053 0.6315789
#[2,] 0.0 0.4170000 0.7500000 0.8830000
这是一种可能性:
> dat<-read.table(text="Method1 100x 0.9736842 0.9736842 0.9473684 0.9473684
+ Method2 100x 0 0.5 0.917 0.667
+ Method1 50x 0.5 0.4210526 0.3421053 0.6315789
+ Method2 50x 0 0.417 0.750 0.883")
> colnames(dat) <- c("Method", "Coverage", "error 0%", "error 1%", "error 2%", "error 4%")
> lapply(unique(dat$Coverage),function(x)dat[dat$Coverage==x,])
[[1]]
Method Coverage error 0% error 1% error 2% error 4%
1 Method1 100x 0.9736842 0.9736842 0.9473684 0.9473684
2 Method2 100x 0.0000000 0.5000000 0.9170000 0.6670000
[[2]]
Method Coverage error 0% error 1% error 2% error 4%
3 Method1 50x 0.5 0.4210526 0.3421053 0.6315789
4 Method2 50x 0.0 0.4170000 0.7500000 0.8830000
编辑:要获取没有前两个列的矩阵,没有属性:
> lapply(unique(dat$Coverage),function(x){
z<-as.matrix(dat[dat$Coverage==x,-(1:2)])
colnames(z)=NULL
rownames(z)=NULL
z})
[[1]]
[,1] [,2] [,3] [,4]
[1,] 0.9736842 0.9736842 0.9473684 0.9473684
[2,] 0.0000000 0.5000000 0.9170000 0.6670000
[[2]]
[,1] [,2] [,3] [,4]
[1,] 0.5 0.4210526 0.3421053 0.6315789
[2,] 0.0 0.4170000 0.7500000 0.8830000
看来您只想为每个覆盖范围提取行?例如
# extract the '100x' rows, columns 3 to 6
subset(dat, Coverage=='100x', 3:6)
# error 0% error 1% error 2% error 4%
#1 0.9736842 0.9736842 0.9473684 0.9473684
#2 0.0000000 0.5000000 0.9170000 0.6670000
您可以使用as.matrix
转换为矩阵(它将保留列名,但可以使用unname
剥离它们)。这里的主力是subset
函数(您也可以使用dat[dat$Coverage=='100x', 3:6]
进行此操作;还有许多其他方法可以提取该子集)。
如果您想在每个覆盖级别上执行此操作,则可以进行循环
for (c in levels(dat$Coverage)) { #loops through values of Coverage
ss <- subset(dat, Coverage==c, 3:6)
# do something with ss
}
例如,如果您想要一个 list 具有每个覆盖级元素,则可以使用 lapply
(它具有内置的for loop)
lapply(levels(dat$Coverage), function (c) subset(dat, Coverage==c, 3:6))
# [[1]]
# error 0% error 1% error 2% error 4%
# 1 0.9736842 0.9736842 0.9473684 0.9473684
# 2 0.0000000 0.5000000 0.9170000 0.6670000
#
# [[2]]
# error 0% error 1% error 2% error 4%
# 3 0.5 0.4210526 0.3421053 0.6315789
# 4 0.0 0.4170000 0.7500000 0.8830000
在您的代码中,您似乎正在通过第3-6列进行循环,而在您的问题中,您似乎想循环浏览覆盖级别。