r-如何使用dplyr管道将百分比文本转换为数字



我有以下tibble:

library(tidyverse)
dat <- structure(list(V1 = c("Number of input reads", "Uniquely mapped reads number", 
"Uniquely mapped reads %", "Average mapped length"), V2 = c("26265603", 
"13330431", "50.75%", "47.37")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L))

它看起来像这样:

V1                           V2      
<chr>                        <chr>   
1 Number of input reads        26265603
2 Uniquely mapped reads number 13330431
3 Uniquely mapped reads %      50.75%  
4 Average mapped length        47.37 

我想做的是将V2列转换为数字。预期的最终结果是:

V1                           V2      
<chr>                        <dbl>   
1 Number of input reads        26265603
2 Uniquely mapped reads number 13330431
3 Uniquely mapped reads %      0.5075 
4 Average mapped length        47.37 

我试过这个

dat %>%
mutate(V2 = case_when(V1 == "Uniquely mapped reads %" ~ as.numeric(sub("%","",V2))/100, 
TRUE ~ as.numeric(V2)))

但它给了我警告:

Warning message:
In eval_tidy(pair$rhs, env = default_env) : NAs introduced by coercion

正确的方法是什么?

使用管道可能会有点复杂,因为我们只想更新几行,但在基R中,我们可以首先找到包含特定字符串的行,然后只更新那些V2值。

inds <- dat$V1 ==  "Uniquely mapped reads %"
dat$V2[inds] <- as.numeric(sub("%", "", dat$V2[inds]))/100
dat
# A tibble: 4 x 2
#  V1                           V2      
#  <chr>                        <chr>   
#1 Number of input reads        26265603
#2 Uniquely mapped reads number 13330431
#3 Uniquely mapped reads %      0.5075  
#4 Average mapped length        47.37 

使用管道的方法可以是

library(dplyr)
dat %>%
mutate(V2 = as.numeric(sub("%", "", V2))/
(c(1, 100)[(V1 == "Uniquely mapped reads %") + 1]))

最新更新