r语言 - 字符串上的分隔行并保留原始行



我正在做一个项目,我想要特定字符串的所有排列。我使用tidyr::separate_rows对特定字符串进行拆分和复制,但我希望保留原始行。

require(dplyr)
require(tidyr)
temp <- tibble(raw_name = c("happy bank dba american bank and trust", " sohappy bank dba american bank"), clean_name = c("american bank and trust", "american bank"))

我现在要做的是:

final <- temp %>%
separate_rows(raw_name, sep = "dba") 

它丢失了我原来的行。我看过文档,但找不到.keep_all = TRUE的版本。以下是上述separate_rows:的结果

raw_name                               clean_name             
<chr>                                  <chr>                  
1 happy bank dba american bank and trust american bank and trust
2 " sohappy bank dba american bank"      american bank          

我目前的解决方案是用有问题的观察结果创建一个新的df,执行separate_rows并对原始行进行rbind。这是我想要的结果:

raw_name                               clean_name             
<chr>                                  <chr>                  
1 happy bank dba american bank and trust american bank and trust
2 " sohappy bank dba american bank"      american bank          
3 "happy bank "                          american bank and trust
4 " american bank and trust"             american bank and trust
5 " sohappy bank "                       american bank          
6 " american bank"                       american bank          

谢谢大家!

分离行后,我们可以与原始数据集绑定

library(dplyr)
library(tidyr)
temp %>% 
separate_rows(raw_name, sep="\s*dba\s*") %>%
bind_rows(temp, .)
# A tibble: 6 x 2
#  raw_name                                 clean_name             
#* <chr>                                    <chr>                  
#1 "happy bank dba american bank and trust" american bank and trust
#2 " sohappy bank dba american bank"        american bank          
#3 "happy bank"                             american bank and trust
#4 "american bank and trust"                american bank and trust
#5 " sohappy bank"                          american bank          
#6 "american bank"                          american bank       

基于@akrun答案,我们可以有另一种方法

temp <- data.frame(raw_name = c("happy bank dba american bank and trust", 
" sohappy bank dba american bank"), 
clean_name = c("american bank and trust", "american bank"), stringsAsFactors = FALSE)
temp2 <- temp %>% 
mutate(raw_name = strsplit(raw_name, "\s*dba\s*")) %>% 
unnest(raw_name) %>% 
bind_rows(temp)

输出

raw_name                                 clean_name             
<chr>                                    <chr>                  
1 "happy bank"                             american bank and trust
2 "american bank and trust"                american bank and trust
3 " sohappy bank"                          american bank          
4 "american bank"                          american bank          
5 "happy bank dba american bank and trust" american bank and trust
6 " sohappy bank dba american bank"        american bank 

更新:如果你想删除字符串开头的空白,你可以尝试:

Library(stringr)
temp2$raw_name <- str_squish(temp2$raw_name)
output:
raw_name                               clean_name             
<chr>                                  <chr>                  
1 happy bank                             american bank and trust
2 american bank and trust                american bank and trust
3 sohappy bank                           american bank          
4 american bank                          american bank          
5 happy bank dba american bank and trust american bank and trust
6 sohappy bank dba american bank         american bank

最新更新