r语言 - 如何将列值与分隔符以及分隔符的例外相结合?

  • 本文关键字:分隔符 相结合 r语言 r regex dplyr
  • 更新时间 :
  • 英文 :


我有以下数据框架:

fruit <- c("apple", "orange", "peach", "")
color <- c("red", "orange", "", "purple")
taste <- c("sweet", "", "sweet", "neutral")
df <- data.frame(fruit, color, taste)

我想把所有的列加到一个名为"combined":

的列中
combined <- c("apple + red + sweet", "orange + orange", "peach + sweet", "purple + neutral")

因此,我有以下数据帧:

df2 <- data.frame(fruit, color, taste, combined)

我尝试使用regex:

df %>%
unite("combined",
fruit,
color,
taste,  
sep=" + ",
remove = FALSE)

我一直在尝试删除"+"当它在开始或结束时,或者如果使用以下正则表达式在它之前有一个空白,但它感觉很草率,似乎没有达到我想要的:

df %>%
as_tibble() %>%
mutate(across(any_of(combined), gsub, pattern = "^\+|\+  \+  \+  \+|\+  \+  \+|\+  \+|\+$", replacement = "")) %>%
mutate_if(is.character, trimws)

任何指导将不胜感激!谢谢!

我们可以将空格("")替换为NA,然后在unite中使用na.rm = TRUE

library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), ~ na_if(.x,  ""))) %>%
unite(combined, everything(), sep = " + ", na.rm = TRUE, 
remove = FALSE)

与产出

combined  fruit  color   taste
1 apple + red + sweet  apple    red   sweet
2     orange + orange orange orange    <NA>
3       peach + sweet  peach   <NA>   sweet
4    purple + neutral   <NA> purple neutral

创建一个函数,该函数接受两个字符串并生成它们的和,并使用Reduce应用它。

library(dplyr)
Paste <- function(x, y) paste0(x, ifelse(x == "" | y == "", "", " + "), y)
df %>% mutate(combined = Reduce(Paste, .))

fruit  color   taste            combined
1  apple    red   sweet apple + red + sweet
2 orange orange             orange + orange
3  peach          sweet       peach + sweet
4        purple neutral    purple + neutral

最新更新