r-使用full_join合并两个以上数据帧时的后缀



我想使用嵌套的full_join将多个数据帧合并在一起。此外,我希望能够为所有列添加后缀,以便在合并数据帧时,每个列名都指示它来自哪个数据帧(例如,一个唯一的时间标识符,如T1、T2…(

x <- data.frame(i = c("a","b","c"), j = 1:3, h = 1:3, stringsAsFactors=FALSE)
y <- data.frame(i = c("b","c","d"), k = 4:6, h = 1:3, stringsAsFactors=FALSE)
z <- data.frame(i = c("c","d","a"), l = 7:9, h = 1:3, stringsAsFactors=FALSE)
full_join(x, y, by='i') %>% left_join(., z, by='I')

有没有一种方法可以集成默认后缀选项,这样我就可以得到一个列名如下的数据集:

column_names <- c("i", "j_T1", "h_T1", "k_T2", "h_T2", "l_T3", "h_T3")

将数据帧放入列表中,添加数据帧名称作为后缀并执行联接。

library(dplyr)
library(purrr)
lst(x, y, z) %>%
imap(function(x, y) x %>% rename_with(~paste(., y, sep = '_'), -i)) %>%
reduce(full_join, by = 'i')
#  i j_x h_x k_y h_y l_z h_z
#1 a   1   1  NA  NA   9   3
#2 b   2   2   4   1  NA  NA
#3 c   3   3   5   2   7   1
#4 d  NA  NA   6   3   8   2

我认为这可以通过使用purrr处理列标题来完成,但我使用了pivot_wider和pivot_langer来更改标题名称:

df <- x %>% 
full_join(y, by = "i") %>% 
full_join(z, by = "i") %>% 
pivot_longer(cols = -i,
names_to = "columns",
values_to = "values") %>% # makes the column headers into a column 
which can be changed
mutate(columns = str_replace(columns, ".x", "_T2"),
columns = str_replace(columns, ".y", "_T3"),
columns = case_when(!str_detect(columns, "T") ~ paste0(columns, "_T1"),
TRUE ~ columns)) %>% 
pivot_wider(names_from = columns,
values_from = values)

这些标题与列出的标题不匹配,但如果顺序很重要,并且第l列应该是T3(本例中只有1个(,希望这段代码能帮助您开始。

最新更新