r语言 - 使用str_extract提取美元金额 - r - Using str_extract to extract dollar amounts 小贝子编程网

我有一列文本，只想提取字符串中包含的美元金额，使用美元符号作为字符串的开头。我可以匹配美元符号，但不确定如何直接在后面取数字(并删除逗号(。

我尝试使用美元符号作为str_extract的锚点，但没有得到的不仅仅是完整的美元金额。

input <- (c("the sum of $175,000,000 and the sum", "the sum of $20,000,000 and the sum", "the sum of $100,000,000 and the sum"))
df<-as.data.frame(input)
df %>% 
    mutate(amount = str_extract(input,"^\$"))

在变异之前运行，它看起来像：

input
the sum of $175,000,000 and the sum
the sum of $20,000,000 and the sum
the sum of $100,000,000 and the sum

我希望它看起来像：

input                                         amount
the sum of $175,000,000 and the sum        175000000
the sum of $20,000,000 and the sum          20000000
the sum of $100,000,000 and the sum        100000000

使用parse_number readr的帮助程序函数可以执行

df %>% 
  mutate(amount = parse_number(str_match(input,"\$([0-9,.]+)")[,2]))

基本上，我们使用str_match去掉"$"，然后将其余部分传递到parse_number以使其成为数字。这也适用于"$11.11"之类的值。

您也可以使用 base 函数as.numeric()而不是parse_number但如果您使用其他 tidyverse 包，我想我会建议这样做。

这是一种方法：

library(stringr)
input <- (c("the sum of $175,000,000 and the sum", "the sum of $20,000,000 and the sum", "the sum of $100,000,000 and the sum"))
df<-as.data.frame(input)
#extract the $, the digits and commas
#then remove the $ and commas
df %>% mutate(amount = str_remove_all(str_extract(input,"\$[0-9,]+"), "[\$,]"))

使用 base R

gsub(",", "", sub(".*[$]([0-9,]+)\s*.*", "\1", input))
#[1] "175000000" "20000000"  "100000000"

r语言 - 使用str_extract提取美元金额

相关内容

最新更新

热门标签：