r语言 - Regex -将多个单词和空格与末尾的十进制数字分开 - r - Regex - separate multiple words and whitespace from decimal numbers at the end 小贝子编程网

我有一个包含单词、空格和数字(整数和小数)的字符串。我想在数据框中将它们分成两列，以便列A包含文本，列B包含数字。这似乎是一个超级简单的任务，但我不知道如何捕获文本。但我确实捕捉到了这些数字。

require(tidyr)
df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))

捕获了列B中的数字，但我无法弄清楚如何在第一组括号中放入什么正则表达式以获得A中的文本:

df |> 
extract(x, c("A", "B"), "()(\d+\.*\d*)")
#  A      B
#1        0
#2     0.01
#3   12.231

可以使用

extract(x, c("A", "B"), "^(.*?)\s*(\d+(?:\.\d+)?)$")

查看regex演示

细节:

^-起始字符串
(.*?)-组1:除换行符外的任何零或多个字符尽可能少
s*-零或多个空白
(d+(?:.d+)?)-组2:一个或多个数字，然后.和一个或多个数字的可选序列
$-字符串

我们捕获一个或多个字母/空格(([A-Za-z ]+))，后面跟着任何空格和数字。([0-9.]+)

library(tidyr)
extract(df, x, into = c("A", "B"), "([A-Za-z ]+)\s*([0-9.]+)", convert = TRUE)
A      B
1             This is text  0.000
2 This is a bit more text   0.010
3           Even more text 12.231

使用{unglue}您可以:

df <- data.frame(x = c("This is text0", "This is a bit more text 0.01", "Even more text12.231"))
unglue::unglue_unnest(df, x, "{A}{B=[0-9.]+}")
#>                          A      B
#> 1             This is text      0
#> 2 This is a bit more text    0.01
#> 3           Even more text 12.231

^{创建于2022-11-24与reprex v2.0.2}

r语言 - Regex -将多个单词和空格与末尾的十进制数字分开

相关内容

最新更新

热门标签：