R-如何使用sapply访问备用行并从中形成矩阵



我有以下数据:

Method1 100x   0.9736842   0.9736842   0.9473684   0.9473684
Method2  100x  0   0.5 0.917   0.667
Method1 50x     0.5 0.4210526   0.3421053   0.6315789
Method2  50x   0   0.417   0.750   0.883

我想做的是使用sapply功能从相同的覆盖范围提取行(100X,50x)分组,然后形成矩阵

结果

#100x
     [,1]  [,2]  [,3]  [,4]
[1,] 0.9736842 0.9736842 0.9473684 0.9473684
[2,] 0.0000000 0.5000000 0.9170000 0.6670000
#50x
     [,1]  [,2]  [,3]  [,4]
[1,] 0.5000000 0.4210526 0.3421053 0.6315789
[2,] 0.0000000 0.4170000 0.7500000 0.8830000

我拥有的是以下代码,但没有产生结果我想要:

 dat <- read.table("http://dpaste.com/1586262/plain/")
 colnames(dat) <- c("Method", "Coverage",  "error 0%", "error 1%", "error 2%", "error 4%")
  sapply(3:6,
   function(x) {
      tmp <- matrix(dat[,x],nrow=2,byrow=TRUE)
      print(tmp);
   }
  )

做什么方法?

这在逻辑上似乎是split的好情况:

lapply(split(dat[3:6], dat$Coverage),function(x) unname(as.matrix(x)) )
#$`100x`
#          [,1]      [,2]      [,3]      [,4]
#[1,] 0.9736842 0.9736842 0.9473684 0.9473684
#[2,] 0.0000000 0.5000000 0.9170000 0.6670000
#
#$`50x`
#     [,1]      [,2]      [,3]      [,4]
#[1,]  0.5 0.4210526 0.3421053 0.6315789
#[2,]  0.0 0.4170000 0.7500000 0.8830000

这是一种可能性:

> dat<-read.table(text="Method1 100x   0.9736842   0.9736842   0.9473684   0.9473684
+ Method2  100x  0   0.5 0.917   0.667
+ Method1 50x     0.5 0.4210526   0.3421053   0.6315789
+ Method2  50x   0   0.417   0.750   0.883")
> colnames(dat) <- c("Method", "Coverage",  "error 0%", "error 1%", "error 2%", "error 4%")
> lapply(unique(dat$Coverage),function(x)dat[dat$Coverage==x,])
[[1]]
   Method Coverage  error 0%  error 1%  error 2%  error 4%
1 Method1     100x 0.9736842 0.9736842 0.9473684 0.9473684
2 Method2     100x 0.0000000 0.5000000 0.9170000 0.6670000
[[2]]
   Method Coverage error 0%  error 1%  error 2%  error 4%
3 Method1      50x      0.5 0.4210526 0.3421053 0.6315789
4 Method2      50x      0.0 0.4170000 0.7500000 0.8830000

编辑:要获取没有前两个列的矩阵,没有属性:

> lapply(unique(dat$Coverage),function(x){
  z<-as.matrix(dat[dat$Coverage==x,-(1:2)])
  colnames(z)=NULL
  rownames(z)=NULL
  z})
[[1]]
          [,1]      [,2]      [,3]      [,4]
[1,] 0.9736842 0.9736842 0.9473684 0.9473684
[2,] 0.0000000 0.5000000 0.9170000 0.6670000
[[2]]
     [,1]      [,2]      [,3]      [,4]
[1,]  0.5 0.4210526 0.3421053 0.6315789
[2,]  0.0 0.4170000 0.7500000 0.8830000

看来您只想为每个覆盖范围提取行?例如

# extract the '100x' rows, columns 3 to 6
subset(dat, Coverage=='100x', 3:6)
#   error 0%  error 1%  error 2%  error 4%
#1 0.9736842 0.9736842 0.9473684 0.9473684
#2 0.0000000 0.5000000 0.9170000 0.6670000

您可以使用as.matrix转换为矩阵(它将保留列名,但可以使用unname剥离它们)。这里的主力是subset函数(您也可以使用dat[dat$Coverage=='100x', 3:6]进行此操作;还有许多其他方法可以提取该子集)。

如果您想在每个覆盖级别上执行此操作,则可以进行循环

for (c in levels(dat$Coverage)) { #loops through values of Coverage
    ss <- subset(dat, Coverage==c, 3:6)
    # do something with ss
}

例如,如果您想要一个 list 具有每个覆盖级元素,则可以使用 lapply(它具有内置的for loop)

lapply(levels(dat$Coverage), function (c) subset(dat, Coverage==c, 3:6))
# [[1]]
#    error 0%  error 1%  error 2%  error 4%
# 1 0.9736842 0.9736842 0.9473684 0.9473684
# 2 0.0000000 0.5000000 0.9170000 0.6670000
# 
# [[2]]
#   error 0%  error 1%  error 2%  error 4%
# 3      0.5 0.4210526 0.3421053 0.6315789
# 4      0.0 0.4170000 0.7500000 0.8830000

在您的代码中,您似乎正在通过第3-6列进行循环,而在您的问题中,您似乎想循环浏览覆盖级别。

最新更新