我觉得我有一个超级简单的问题,但是对于我的生活,我在谷歌或搜索这里找不到它(或者我不知道找到解决方案的正确术语),所以在这里。
我在R中有大量的文本,我想在其中识别所有的数字/数字,并添加一个特定的数字,例如5。
作为一个小例子,如果这是我的文本:
text <- c("Hi. It is 6am. I want to leave at 7am")
我希望输出是:
> text
[1] "Hi. It is 11am. I want to leave at 12am"
但是我还需要每个数字的加法,所以如果这是文本:
text <- c("Hi. It is 2017. I am 35 years old.")
…我希望输出为:
> text
[1] "Hi. It is 75612. I am 810 years old."
我试过从字符串中"抓取"数字并添加5,但我不知道如何将它们返回到原始字符串中,以便我可以获得全文。
我该怎么做呢?提前感谢!
我是这样做的。我会搜索一个后面跟着am或pm的数字,然后在数学表达式中加入gsubfn
来计算。这是相当灵活的,但在目前的实现中需要花费整整几个小时。如果您想交换它们,我添加了am和pm,但我没有尝试通过编码来检测数字是否从am变为pm。还要注意的是,我并没有将代码从12转到1。如果你把大于12的数相加,你会得到一个大于12的数。
text1 <- c("Hi. It is 6am. I want to leave at 7am")
text2 <- c("It is 9am. I want to leave at 10am, but the cab comes at 11am. Can I push my flight to 12am?")
change_time <- function(text, hours, sign, am_pm){
string_change <- glue::glue("`(\1{sign}{hours})`{am_pm}")
gsub("(\d+)(?=am|pm)(am|pm)", string_change, text, perl = TRUE)|>
gsubfn::fn$c()
}
change_time(text = text1, hours = 5, sign = "+", am_pm = "am")
#> [1] "Hi. It is 11am. I want to leave at 12am"
change_time(text = text2, hours = 3, sign = "-", am_pm = "pm")
#> [1] "It is 6pm. I want to leave at 7pm, but the cab comes at 8pm. Can I push my flight to 9pm?"
text1 <- c("Hi. It is 2017. I am 35 years old.")
text2 <- c("Hi. It is 6am. I want to leave at 7am")
change_number <- function(text, change, sign){
string_change <- glue::glue("`(\1{sign}{change})`")
gsub("(\d)", string_change, text, perl = TRUE) %>%
gsubfn::fn$c() }
change_number(text = text1, change = 5, sign = "+")
#>[1] "Hi. It is 75612. I am 810 years old."
change_number(text = text2, change = 5, sign = "+")
#>[1] "Hi. It is 11am. I want to leave at 12am"
这工作完美。非常感谢@AndS。,我调整(或者更确切地说,简化)您的代码以更好地满足我的需求。我决定自己找出其他文本哈哈,所以谢谢你告诉我怎么做!
base R:
add_n = (x, n, by_digit = FALSE) {
if (by_digit) ptrn = "[0-9]" else ptrn = "[0-9]+"
tmp = gregexpr(ptrn, x)
raw = regmatches(x, gregexpr(ptrn, x))
raw_plusn = lapply(raw, (x) as.integer(x) + n)
for (i in seq_along(x)) regmatches(x[i], tmp[i]) = raw_plusn[i]
x
}
text = c(
"Hi. It is 6am. I want to leave at 7am",
"wow it's 505 dollars and 19 cents",
"Hi. It is 2017. I am 35 years old."
)
> add_n(text, 5)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 510 dollars and 24 cents"
# [3] "Hi. It is 2022. I am 40 years old."
> add_n(text, -2)
# [1] "Hi. It is 4am. I want to leave at 5am" "wow it's 503 dollars and 17 cents"
# [3] "Hi. It is 2015. I am 33 years old."
> add_n(text, 5, by_digit = TRUE)
# [1] "Hi. It is 11am. I want to leave at 12am"
# [2] "wow it's 10510 dollars and 614 cents"
# [3] "Hi. It is 75612. I am 810 years old."
tidyverse
的解决方案:
data.frame(text) %>%
# separate `text` into individual characters:
separate_rows(text, sep = "(?<!^)(?!$)") %>%
# add `5` to any digit:
mutate(
# if you detect a digit...
text = ifelse(str_detect(text, "\d"),
# ... extract it, convert it to numeric, add `5`:
as.numeric(str_extract(text, "\d")) + 5,
# ... else leave `text` as is:
text)
) %>%
# string the characters back together:
summarise(text = str_c(text, collapse = ""))
# A tibble: 1 × 1
text
<chr>
1 Hi. It is 11am. I want to leave at 12am
数据1:
text <- c("Hi. It is 6am. I want to leave at 7am")
请注意,相同的代码也适用于第二个text
,没有任何更改:
# A tibble: 1 × 1
text
<chr>
1 Hi. It is 75612. I am 810 years old.
数据2:
text <- c("Hi. It is 2017. I am 35 years old.")