在R中的同一数据帧中粘贴两个同名列

  • 本文关键字:两个 数据帧 r
  • 更新时间 :
  • 英文 :


我是新来的。我正在与R作斗争。我有一个数据帧,在同一名称下有不同的列,我想使用"粘贴"将它们粘贴在一起-&";。

可再现样本数据:

df <- tribble(
~Region,  ~Length_House1,   ~Length_House1,   ~Length_House2,   ~Length_House2,   ~Length_House3,   ~Length_House3,
"Montana", 20,  30,  20,  20,  40,  50,
"Montana", 52,  64,  60,  60,  70,  76,
"Montana", 52,  68,  60,  60,  70,  70,
"Montana", 44,  52,  60,  60,  76,  76,
"Montana", 44,  76,  60,  60,  70,  76,
"Idaho",   48,  56,  60,  60,  76,  76,
"Idaho",   48,  72,  60,  60,  70,  76
)

期望输出

Region Length_House1 Length_House2 Length_House3
Montana         20-30         20-20         40-50
Montana         52-64         60-60         70-76
Montana         52-68         60-60         70-70
Montana         44-52         60-60         76-76
Montana         44-76         60-60         70-76
Idaho         48-56         60-60         76-76
Idaho         48-72         60-60         70-76

如果您指定要显式组合的列(例如,第2、第4和第6列与第3、第5和第7列(,您可以尝试:

as.data.frame(cbind(Region = df$Region, mapply(paste, 
df[, c(2,4,6)], 
df[, c(3,5,7)], sep = '-')))

在这里,您可以逐行将mapplypaste列一起使用,用-分隔。

如果列名包括要组合的同一对中的公共字符(例如,Length_HouseA1与Length_HhouseA2,Length_HouseB1与Lengh_HouseB2或其他(,则可以尝试:

data.frame(lapply(split.default(df[-1],
sub("\d+$", "", names(df)[-1])), function(x) do.call(paste, c(x, sep="-"))))

输出

Region Length_House1 Length_House2 Length_House3
1 Montana         20-30         20-20         40-50
2 Montana         52-64         60-60         70-76
3 Montana         52-68         60-60         70-70
4 Montana         44-52         60-60         76-76
5 Montana         44-76         60-60         70-76
6   Idaho         48-56         60-60         76-76
7   Idaho         48-72         60-60         70-76

特别感谢我的老师和朋友@Ronak Shah和@AnilGoyal,他们教会了我一个关于使用getglue函数的非常棒的解决方案。

这里有一个你可能感兴趣的小方法。对于这个解决方案,我首先必须更改双列中第二列的名称,以便于数据操作过程。

library(dplyr)
library(stringr)
library(purrr)
library(glue)
df %>%
select(c(2, 4, 6)) %>%
rename_with(., ~ str_remove(.x, fixed("Length_")), .cols = everything()) %>%
bind_cols(df %>%
select(-c(2, 4, 6))) %>% 
relocate(Region) %>%
mutate(map_dfc(list(Length_House_1 = 1,
Length_House_2 = 2, 
Length_House_3 = 3), ~ paste(get(glue("House{.x}")), get(glue("Length_House{.x}")), 
sep = "_"))) %>%
select(-c(2:7))

Region  Length_House_1 Length_House_2 Length_House_3
<chr>   <chr>          <chr>          <chr>         
1 Montana 20_30          20_20          40_50         
2 Montana 52_64          60_60          70_76         
3 Montana 52_68          60_60          70_70         
4 Montana 44_52          60_60          76_76         
5 Montana 44_76          60_60          70_76         
6 Idaho   48_56          60_60          76_76         
7 Idaho   48_72          60_60          70_76 

数据:

df <- tribble(
~Region,  ~Length_House1,   ~Length_House1,   ~Length_House2,   ~Length_House2,   ~Length_House3,   ~Length_House3,
"Montana", 20,  30,  20,  20,  40,  50,
"Montana", 52,  64,  60,  60,  70,  76,
"Montana", 52,  68,  60,  60,  70,  70,
"Montana", 44,  52,  60,  60,  76,  76,
"Montana", 44,  76,  60,  60,  70,  76,
"Idaho",   48,  56,  60,  60,  76,  76,
"Idaho",   48,  72,  60,  60,  70,  76
)

以下是另一个使用tidyr:的解决方案

  1. 创建唯一的colname
  2. 合并列
library(tidyr)
library(dplyr)
uniquecolnames <- c(sprintf("f%02d", seq(1,7)),"label")
colnames(df) <- uniquecolnames
df %>% 
unite("Length_House1", 2:3, sep="-") %>% 
unite("Length_House2", 3:4, sep="-") %>% 
unite("Length_House3", 4:5, sep="-") 

输出:

f01     Length_House1 Length_House2 Length_House3
<chr>   <chr>         <chr>         <chr>        
1 Montana 20-30         20-20         40-50        
2 Montana 52-64         60-60         70-76        
3 Montana 52-68         60-60         70-70        
4 Montana 44-52         60-60         76-76        
5 Montana 44-76         60-60         70-76        
6 Idaho   48-56         60-60         76-76        
7 Idaho   48-72         60-60         70-76   

最新更新