如何匹配/删除R中注释开头的数字



我有一个导入到R中的注释列表。下面是一些注释如何导入的示例-

9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5

注意,我将注释保存到一个名为some_str的变量中

我的目标是打印出每一行,不在行的开头加数字。像这样-

This is some string number 1
This is some string number 2
This is some string number 3
This is some string number 4
This is some string number 5

我已经使用下面的代码来处理上面的第一行(9. This is some string number 1(-

pattern = "([0-9][.][ ])"
str_replace(some_str, pattern, "")

输出This is some string number 1

然而,我在匹配/删除其他行时遇到了困难。例如,如果我创建图案CCD_;9T";关于第二行,我如何只删除数字9。

最后还要注意的是,我正在尝试删除仅在评论开头的数字。例如,如果第3行有以下注释-

"9 This is some string number 2. 2 dogs came to town"

我只想删除评论开头的9。我不想在句号之后删除2。

另一个解决方案:

library(tidyverse)
dat <- data.frame(x = c("67,68 This is my test",
"67-68 This is my test",
"8 This is my test"))
dat %>%
mutate(x2 = str_replace(x, pattern = "^[^A-Z]*", ""))

它给出:

x              x2
1 67,68 This is my test This is my test
2 67-68 This is my test This is my test
3     8 This is my test This is my test

这里是一个基本的R解决方案
使用的模式是

pattern <- "^[-[:digit:][:punct:][:space:]]*"

它适用于所有发布的测试用例。

sub(pattern, "", x)
#[1] "This is some string number 1" "This is some string number 2"
#[3] "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"

相同的正则表达式适用于最后一个字符串:

sub(pattern, "", y)
#[1] "This is some string number 2. 2 dogs came to town"

stringr的解决方案可以是

library(stringr)
str_remove(x, pattern)
str_remove(y, pattern)

数据

x <- scan(what = character(), text = "
9. This is some string number 1
9This is some string number 2
9 This is some string number 3
9-This is some string number 4
67-68 This is some string number 5
", sep = "n")
y <- "9 This is some string number 2. 2 dogs came to town"

我们可以使用sub

sub("^[-0-9. ]+", "", v1)
#[1] "This is some string number 1" "This is some string number 2" "This is some string number 3" "This is some string number 4"
#[5] "This is some string number 5"

数据

v1 <- c("9. This is some string number 1", "9This is some string number 2", 
"9 This is some string number 3", "9-This is some string number 4", 
"67-68 This is some string number 5")
stringr::str_extract("9. This is some string number 1 2. 2 dogs came to town", "^([0-9][.][ ])")

这应该行得通
只需将您的模式更改为:
^([0-9][.][](

最新更新