r-使用separate()拆分不同大小的字符串

所以我想把一个divide字符串变量分成几个部分，但我要把它们分成的子字符串长度不同，而且我没有像.，|这样的分隔符等等。所以我从一个数据帧开始，比如：

df <- data.frame(x=c("bigApe","smallApe","bigDog","smallDog"),c(1,2,5,3))
x         y
bigApe    1
smallApe  2
bigDog    5
smallDog  3

我希望它最终会变成这样：

size  anim  y
1 big   Ape   1
2 small Ape   2
3 big   Dog   5
4 small Dog   3

我已经研究过使用separate((的东西，它们似乎应该能够做到这一点，但它们似乎都在寻找可预测的分隔符/空白或设置的子字符串长度。我可以用正则表达式来查找大写字母，但它不会保留以下字母：

df %>% separate(x,c("size","anim"),sep="[A-Z]")
size anim num
1   big   pe   1
2 small   pe   2
3   big   og   5
4 small   og   3

我要找的数据没有。我想我可以在stringr中添加一些内容，但即使在那里，我发现的所有内容似乎都需要指定的字符串长度。我当然可以做一个可怕的循环，但肯定有比这更快的方法！

谢谢！

您需要这个：

df %>% separate(x,c("size","anim"), sep = "(?!^)(?=[[:upper:]])")

# A tibble: 4 x 3
size  anim      y
<chr> <chr> <dbl>
1 big   Ape       1
2 small Ape       2
3 big   Dog       5
4 small Dog       3

我不确定您是否可以使用分隔符。。。但是，您可以使用stringr::str_locate()来找到大写字母的起始位置，然后使用substr(以及一些dplyr魔术(：

data.frame(x=c("bigApe","smallApe","bigDog","smallDog"),c(1,2,5,3), stringsAsFactors = FALSE) %>%
rowwise() %>%
mutate(size = substr(x, 1,stringr::str_locate(x, "[A-Z]")[1]-1),
animal = substr(x, stringr::str_locate(x, "[A-Z]")[1], nchar(x))
)
# A tibble: 4 x 4
# Rowwise: 
x        c.1..2..5..3. size  animal
<chr>            <dbl> <chr> <chr> 
1 bigApe               1 big   Ape   
2 smallApe             2 small Ape   
3 bigDog               5 big   Dog   
4 smallDog             3 small Dog

您还可以使用基R函数gsub来使用正则表达式组解析原始列。

df$size <- gsub("([a-z]*)([A-Z]?[a-z]*)", "\1", df$x)
df$animal <- gsub("([a-z]*)([A-Z]?[a-z]*)", "\2", df$x)

相关内容

最新更新

热门标签：