我有以下数据框架:
fruit <- c("apple", "orange", "peach", "")
color <- c("red", "orange", "", "purple")
taste <- c("sweet", "", "sweet", "neutral")
df <- data.frame(fruit, color, taste)
我想把所有的列加到一个名为"combined":
的列中combined <- c("apple + red + sweet", "orange + orange", "peach + sweet", "purple + neutral")
因此,我有以下数据帧:
df2 <- data.frame(fruit, color, taste, combined)
我尝试使用regex:
df %>%
unite("combined",
fruit,
color,
taste,
sep=" + ",
remove = FALSE)
我一直在尝试删除"+"当它在开始或结束时,或者如果使用以下正则表达式在它之前有一个空白,但它感觉很草率,似乎没有达到我想要的:
df %>%
as_tibble() %>%
mutate(across(any_of(combined), gsub, pattern = "^\+|\+ \+ \+ \+|\+ \+ \+|\+ \+|\+$", replacement = "")) %>%
mutate_if(is.character, trimws)
任何指导将不胜感激!谢谢!
我们可以将空格(""
)替换为NA
,然后在unite
中使用na.rm = TRUE
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), ~ na_if(.x, ""))) %>%
unite(combined, everything(), sep = " + ", na.rm = TRUE,
remove = FALSE)
与产出
combined fruit color taste
1 apple + red + sweet apple red sweet
2 orange + orange orange orange <NA>
3 peach + sweet peach <NA> sweet
4 purple + neutral <NA> purple neutral
创建一个函数,该函数接受两个字符串并生成它们的和,并使用Reduce
应用它。
library(dplyr)
Paste <- function(x, y) paste0(x, ifelse(x == "" | y == "", "", " + "), y)
df %>% mutate(combined = Reduce(Paste, .))
为
fruit color taste combined
1 apple red sweet apple + red + sweet
2 orange orange orange + orange
3 peach sweet peach + sweet
4 purple neutral purple + neutral