r语言 - 如何使用 dplyr 管道根据 grepl 特殊字符排除行 - r - How to exclude rows based on grepl special character using dplyr piping 小贝子编程网

>我有以下数据框：

library(tidyverse)
ndf <- structure(list(experiment_status = c("Negative？", "Negative？", 
"Negative", "Negative？", "Negative？", "Negative？"), id = 1:6), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))
ndf
#> # A tibble: 6 x 2
#>   experiment_status    id
#>   <chr>             <int>
#> 1 Negative？            1
#> 2 Negative？            2
#> 3 Negative              3
#> 4 Negative？            4
#> 5 Negative？            5
#> 6 Negative？            6

我想做的是过滤仅保留那些没有问号?的行，即在管道之后只保留第 3 行。

为什么会失败？

ndf %>% 
filter(!grepl("[?]", experiment_status))

正确的方法是什么？

ndf %>% 
filter(!grepl(intToUtf8(65311), experiment_status))
# A tibble: 1 x 2
experiment_status    id
<chr>             <int>
1 Negative              3

您还注意到的一件事是，如果您将 tibble 强制到数据帧，它会为您提供其 hex-Unicode，这是<U+FF1F>。您也可以使用它来过滤。

即：

ndf %>% 
filter(!grepl(intToUtf8(0xFF1F), experiment_status))
# A tibble: 1 x 2
experiment_status    id
<chr>             <int>
1 Negative              3

导入在非英语操作系统中编写的csv文件时可能会出现问题。

> '？' =='?'
[1] FALSE
ndf %>% filter(!grepl('？',experiment_status))
#Try removing white space but it fails
> trimws(ndf$experiment_status,'both')
[1] "Negative？" "Negative？" "Negative"   "Negative？" "Negative？" "Negative？"
#Change '？' to '?' using gsub
> gsub('？', '?', ndf$experiment_status)
[1] "Negative?" "Negative?" "Negative"  "Negative?" "Negative?" "Negative?"

ndf %>% mutate(experiment_status_clean = gsub('？', '?', experiment_status))
#Now you are search for a litteral ? so you need to escape ? using \
ndf %>% mutate(experiment_status_clean = gsub('？', '?', experiment_status)) %>% 
filter(!grepl('\?',experiment_status_clean))

ndf %>% 
filter(!grepl("?", experiment_status, fixed = TRUE))

但是在你的例子中，我认为filter(experiment_status == "Negative")也可以。

编辑：或者因为我们也可以有"积极" -

ndf %>% 
filter(experiment_status %in% c("Negative", "Positive"))

要清理您的审讯标记，您可以使用stringi::stri_trans_general.我建议您尽早在数据上使用它，以避免出现不好的意外。

library(stringi)
ndf %>%
mutate_at("experiment_status", stri_trans_general, "latin-ascii") %>%
filter(!grepl("[?]", experiment_status)) # or filter(!grepl("\?$", experiment_status))
# A tibble: 1 x 2
#     experiment_status    id
#                 <chr> <int>
#   1          Negative     3

在这里不需要有关有问题的字符的知识，您可以通过相同的标记清除其他不幸的标点符号或替代字符。

r语言 - 如何使用 dplyr 管道根据 grepl 特殊字符排除行

相关内容

最新更新

热门标签：