my data
dfx=structure(list(V1 = c("(Description and Operation, 100-00 General Information) <a data-searchnum=G2107576 data-procuid=G1620638>Acceleration Control - Overview",
"(Description and Operation, 310-02 Acceleration Control) <a data-searchnum=G2232632 data-procuid=G2210282>Acceleration Control - System Operation and Component Description",
"(Description and Operation, 310-02 Acceleration Control) <a data-searchnum=G2232633 data-procuid=G2210283>Acceleration Control",
"(Diagnosis and Testing, 310-02 Acceleration Control) <a data-searchnum=G2118147 data-procuid=G2118148>Accelerator Pedal ")), class = "data.frame", row.names = c(NA,
-4L))
我需要提取data-searchnum
并将其存储在新的df
G2107576
G2232632
G2232633
G2118147
G2110035
在data-searchnum=
子字符串后使用str_extract
和捕获组((...)
)
library(stringr)
str_extract(dfx$V1, 'data-searchnum=(\S+)', group = 1)
[1] "G2107576" "G2232632" "G2232633" "G2118147"
或str_replace捕获data-searchnum=
之后的非空白字符并替换为反向引用(\1
)
str_replace(dfx$V1, ".*data-searchnum=(\S+)\s+.*", "\1")
[1] "G2107576" "G2232632" "G2232633" "G2118147"
如果我们正在创建一个新的数据
library(dplyr)
df2 <- dfx %>%
mutate(V1 = str_extract(V1, 'data-searchnum=(\S+)', group = 1))
> df2
V1
1 G2107576
2 G2232632
3 G2232633
4 G2118147
或者在base R
中使用与str_replace
相同的方法
sub(".*data-searchnum=(\S+)\s+.*", "\1", dfx$V1)
[1] "G2107576" "G2232632" "G2232633" "G2118147"