我正在做一个项目,我想要特定字符串的所有排列。我使用tidyr::separate_rows
对特定字符串进行拆分和复制,但我希望保留原始行。
require(dplyr)
require(tidyr)
temp <- tibble(raw_name = c("happy bank dba american bank and trust", " sohappy bank dba american bank"), clean_name = c("american bank and trust", "american bank"))
我现在要做的是:
final <- temp %>%
separate_rows(raw_name, sep = "dba")
它丢失了我原来的行。我看过文档,但找不到.keep_all = TRUE
的版本。以下是上述separate_rows
:的结果
raw_name clean_name
<chr> <chr>
1 happy bank dba american bank and trust american bank and trust
2 " sohappy bank dba american bank" american bank
我目前的解决方案是用有问题的观察结果创建一个新的df,执行separate_rows
并对原始行进行rbind。这是我想要的结果:
raw_name clean_name
<chr> <chr>
1 happy bank dba american bank and trust american bank and trust
2 " sohappy bank dba american bank" american bank
3 "happy bank " american bank and trust
4 " american bank and trust" american bank and trust
5 " sohappy bank " american bank
6 " american bank" american bank
谢谢大家!
分离行后,我们可以与原始数据集绑定
library(dplyr)
library(tidyr)
temp %>%
separate_rows(raw_name, sep="\s*dba\s*") %>%
bind_rows(temp, .)
# A tibble: 6 x 2
# raw_name clean_name
#* <chr> <chr>
#1 "happy bank dba american bank and trust" american bank and trust
#2 " sohappy bank dba american bank" american bank
#3 "happy bank" american bank and trust
#4 "american bank and trust" american bank and trust
#5 " sohappy bank" american bank
#6 "american bank" american bank
基于@akrun答案,我们可以有另一种方法
temp <- data.frame(raw_name = c("happy bank dba american bank and trust",
" sohappy bank dba american bank"),
clean_name = c("american bank and trust", "american bank"), stringsAsFactors = FALSE)
temp2 <- temp %>%
mutate(raw_name = strsplit(raw_name, "\s*dba\s*")) %>%
unnest(raw_name) %>%
bind_rows(temp)
输出
raw_name clean_name
<chr> <chr>
1 "happy bank" american bank and trust
2 "american bank and trust" american bank and trust
3 " sohappy bank" american bank
4 "american bank" american bank
5 "happy bank dba american bank and trust" american bank and trust
6 " sohappy bank dba american bank" american bank
更新:如果你想删除字符串开头的空白,你可以尝试:
Library(stringr)
temp2$raw_name <- str_squish(temp2$raw_name)
output:
raw_name clean_name
<chr> <chr>
1 happy bank american bank and trust
2 american bank and trust american bank and trust
3 sohappy bank american bank
4 american bank american bank
5 happy bank dba american bank and trust american bank and trust
6 sohappy bank dba american bank american bank