您好,我正在尝试分割列
下面是我的df
value = c("AB/cc/dd/id,1,3,33","CC/DD/EE/F,F/GG,22,33,4","AB/cc,22,2,34","KK/SS/G,G,3,22,41")
df = data.frame(value)
我正试图分割列并获得字符串,直到第3 &;逗号(,)&;From last
我的输出df应该如下所示
value1 = c("AB/cc/dd/id","CC/DD/EE/F,F/GG","AB/cc","KK/SS/G,G")
df_out = data.frame(value1)
我使用字符串包来完成它
library(stringr)
df[c('col1', 'col2')] <- str_split_fixed(df$value, ',', 2)
Thanks in advance
这里有另一种方法可以获取从最后一个到第3个逗号的字符串,而不需要regex:
df$value |>
str_split(",") |>
map(function(x) x[1: (length(x)-3)] |>
str_c(collapse = ",")) |>
map_df(as.data.frame) |>
setNames("value1")
# value1
#1 AB/cc/dd/id
#2 CC/DD/EE/F,F/GG
#3 AB/cc
#4 KK/SS/G,G
为了防止最后3个逗号之间不仅有数字,而且可能有任何其他字母数字(包括/),您可以使用:
a <- "AB/cc/dd/id,1,/gg/,33"
stringr::str_extract(a, ".*(?=(\,[/A-z0-9]+){3})")
#> [1] "AB/cc/dd/id"
或其他base R解:
gsub("(\,.*){3}$", "", a)
使用gsub
:
gsub("[^[:alpha:],/]", "", value) |> gsub(",+$", "", .)
[1] "AB/cc/dd/id" "CC/DD/EE/F,F/GG" "AB/cc" "KK/SS/G,G"
正则表达式解释:
"[^[:alpha:],/]"
[]
:定义字符列表^
:否定该列表,gsub将查找匹配列表中没有的任何内容[:alpha:],/
:列表内容、字母、逗号和&;/&;;
",+$"
,
:匹配逗号+
:可能出现一次或多次$
:只在字符串 末尾
您可以在base R内尝试gsub
> gsub("(,[^,]+){3}$", "", value)
[1] "AB/cc/dd/id" "CC/DD/EE/F,F/GG" "AB/cc" "KK/SS/G,G"