如何在R中操纵字符串中的数字?

  • 本文关键字:数字 字符串 操纵 r string
  • 更新时间 :
  • 英文 :


我觉得我有一个超级简单的问题,但是对于我的生活,我在谷歌或搜索这里找不到它(或者我不知道找到解决方案的正确术语),所以在这里。

我在R中有大量的文本,我想在其中识别所有的数字/数字,并添加一个特定的数字,例如5。

作为一个小例子,如果这是我的文本:

text <- c("Hi. It is 6am. I want to leave at 7am")

我希望输出是:

> text
[1] "Hi. It is 11am.  I want to leave at 12am"

但是我还需要每个数字的加法,所以如果这是文本:

text <- c("Hi. It is 2017. I am 35 years old.")

…我希望输出为:

> text
[1] "Hi. It is 75612. I am 810 years old."

我试过从字符串中"抓取"数字并添加5,但我不知道如何将它们返回到原始字符串中,以便我可以获得全文。

我该怎么做呢?提前感谢!

我是这样做的。我会搜索一个后面跟着am或pm的数字,然后在数学表达式中加入gsubfn来计算。这是相当灵活的,但在目前的实现中需要花费整整几个小时。如果您想交换它们,我添加了am和pm,但我没有尝试通过编码来检测数字是否从am变为pm。还要注意的是,我并没有将代码从12转到1。如果你把大于12的数相加,你会得到一个大于12的数。

text1 <- c("Hi. It is 6am. I want to leave at 7am")
text2 <- c("It is 9am. I want to leave at 10am, but the cab comes at 11am. Can I push my flight to 12am?")
change_time <- function(text, hours, sign, am_pm){
string_change <- glue::glue("`(\1{sign}{hours})`{am_pm}")

gsub("(\d+)(?=am|pm)(am|pm)", string_change, text, perl = TRUE)|>
gsubfn::fn$c()
}
change_time(text = text1, hours = 5, sign = "+", am_pm = "am")
#> [1] "Hi. It is 11am. I want to leave at 12am"
change_time(text = text2, hours = 3, sign = "-", am_pm = "pm")
#> [1] "It is 6pm. I want to leave at 7pm, but the cab comes at 8pm. Can I push my flight to 9pm?"
text1 <- c("Hi. It is 2017. I am 35 years old.")
text2 <- c("Hi. It is 6am. I want to leave at 7am")
change_number <- function(text, change, sign){   
string_change <- glue::glue("`(\1{sign}{change})`")
gsub("(\d)", string_change, text, perl = TRUE) %>%
gsubfn::fn$c() }
change_number(text = text1, change = 5, sign = "+")
#>[1] "Hi. It is 75612. I am 810 years old."
change_number(text = text2, change = 5, sign = "+")
#>[1] "Hi. It is 11am. I want to leave at 12am"

这工作完美。非常感谢@AndS。,我调整(或者更确切地说,简化)您的代码以更好地满足我的需求。我决定自己找出其他文本哈哈,所以谢谢你告诉我怎么做!

base R:

add_n = (x, n, by_digit = FALSE) {
if (by_digit) ptrn = "[0-9]" else ptrn = "[0-9]+"
tmp       = gregexpr(ptrn, x)
raw       = regmatches(x, gregexpr(ptrn, x))
raw_plusn = lapply(raw, (x) as.integer(x) + n)
for (i in seq_along(x)) regmatches(x[i], tmp[i]) = raw_plusn[i]
x
}
text = c(
"Hi. It is 6am. I want to leave at 7am", 
"wow it's 505 dollars and 19 cents",
"Hi. It is 2017. I am 35 years old."
)
> add_n(text, 5)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 510 dollars and 24 cents"      
# [3] "Hi. It is 2022. I am 40 years old."     
> add_n(text, -2)
# [1] "Hi. It is 4am. I want to leave at 5am" "wow it's 503 dollars and 17 cents"    
# [3] "Hi. It is 2015. I am 33 years old."   
> add_n(text, 5, by_digit = TRUE)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 10510 dollars and 614 cents"   
# [3] "Hi. It is 75612. I am 810 years old."  

tidyverse的解决方案:

data.frame(text) %>%
# separate `text` into individual characters:
separate_rows(text,  sep = "(?<!^)(?!$)") %>% 
# add `5` to any digit:
mutate(
# if you detect a digit...
text = ifelse(str_detect(text, "\d"),
# ... extract it, convert it to numeric, add `5`:
as.numeric(str_extract(text, "\d")) + 5,
# ... else leave `text` as is:
text)
) %>% 
# string the characters back together:
summarise(text = str_c(text, collapse = ""))
# A tibble: 1 × 1
text                                   
<chr>                                  
1 Hi. It is 11am. I want to leave at 12am

数据1:

text <- c("Hi. It is 6am. I want to leave at 7am")

请注意,相同的代码也适用于第二个text,没有任何更改:

# A tibble: 1 × 1
text                                
<chr>                               
1 Hi. It is 75612. I am 810 years old.

数据2:

text <- c("Hi. It is 2017. I am 35 years old.")

最新更新