我是新来的。我正在与R作斗争。我有一个数据帧,在同一名称下有不同的列,我想使用"粘贴"将它们粘贴在一起-&";。
可再现样本数据:
df <- tribble(
~Region, ~Length_House1, ~Length_House1, ~Length_House2, ~Length_House2, ~Length_House3, ~Length_House3,
"Montana", 20, 30, 20, 20, 40, 50,
"Montana", 52, 64, 60, 60, 70, 76,
"Montana", 52, 68, 60, 60, 70, 70,
"Montana", 44, 52, 60, 60, 76, 76,
"Montana", 44, 76, 60, 60, 70, 76,
"Idaho", 48, 56, 60, 60, 76, 76,
"Idaho", 48, 72, 60, 60, 70, 76
)
期望输出
Region Length_House1 Length_House2 Length_House3
Montana 20-30 20-20 40-50
Montana 52-64 60-60 70-76
Montana 52-68 60-60 70-70
Montana 44-52 60-60 76-76
Montana 44-76 60-60 70-76
Idaho 48-56 60-60 76-76
Idaho 48-72 60-60 70-76
如果您指定要显式组合的列(例如,第2、第4和第6列与第3、第5和第7列(,您可以尝试:
as.data.frame(cbind(Region = df$Region, mapply(paste,
df[, c(2,4,6)],
df[, c(3,5,7)], sep = '-')))
在这里,您可以逐行将mapply
到paste
列一起使用,用-
分隔。
如果列名包括要组合的同一对中的公共字符(例如,Length_HouseA1与Length_HhouseA2,Length_HouseB1与Lengh_HouseB2或其他(,则可以尝试:
data.frame(lapply(split.default(df[-1],
sub("\d+$", "", names(df)[-1])), function(x) do.call(paste, c(x, sep="-"))))
输出
Region Length_House1 Length_House2 Length_House3
1 Montana 20-30 20-20 40-50
2 Montana 52-64 60-60 70-76
3 Montana 52-68 60-60 70-70
4 Montana 44-52 60-60 76-76
5 Montana 44-76 60-60 70-76
6 Idaho 48-56 60-60 76-76
7 Idaho 48-72 60-60 70-76
特别感谢我的老师和朋友@Ronak Shah和@AnilGoyal,他们教会了我一个关于使用get
和glue
函数的非常棒的解决方案。
这里有一个你可能感兴趣的小方法。对于这个解决方案,我首先必须更改双列中第二列的名称,以便于数据操作过程。
library(dplyr)
library(stringr)
library(purrr)
library(glue)
df %>%
select(c(2, 4, 6)) %>%
rename_with(., ~ str_remove(.x, fixed("Length_")), .cols = everything()) %>%
bind_cols(df %>%
select(-c(2, 4, 6))) %>%
relocate(Region) %>%
mutate(map_dfc(list(Length_House_1 = 1,
Length_House_2 = 2,
Length_House_3 = 3), ~ paste(get(glue("House{.x}")), get(glue("Length_House{.x}")),
sep = "_"))) %>%
select(-c(2:7))
Region Length_House_1 Length_House_2 Length_House_3
<chr> <chr> <chr> <chr>
1 Montana 20_30 20_20 40_50
2 Montana 52_64 60_60 70_76
3 Montana 52_68 60_60 70_70
4 Montana 44_52 60_60 76_76
5 Montana 44_76 60_60 70_76
6 Idaho 48_56 60_60 76_76
7 Idaho 48_72 60_60 70_76
数据:
df <- tribble(
~Region, ~Length_House1, ~Length_House1, ~Length_House2, ~Length_House2, ~Length_House3, ~Length_House3,
"Montana", 20, 30, 20, 20, 40, 50,
"Montana", 52, 64, 60, 60, 70, 76,
"Montana", 52, 68, 60, 60, 70, 70,
"Montana", 44, 52, 60, 60, 76, 76,
"Montana", 44, 76, 60, 60, 70, 76,
"Idaho", 48, 56, 60, 60, 76, 76,
"Idaho", 48, 72, 60, 60, 70, 76
)
以下是另一个使用tidyr
:的解决方案
- 创建唯一的colname
- 合并列
library(tidyr)
library(dplyr)
uniquecolnames <- c(sprintf("f%02d", seq(1,7)),"label")
colnames(df) <- uniquecolnames
df %>%
unite("Length_House1", 2:3, sep="-") %>%
unite("Length_House2", 3:4, sep="-") %>%
unite("Length_House3", 4:5, sep="-")
输出:
f01 Length_House1 Length_House2 Length_House3
<chr> <chr> <chr> <chr>
1 Montana 20-30 20-20 40-50
2 Montana 52-64 60-60 70-76
3 Montana 52-68 60-60 70-70
4 Montana 44-52 60-60 76-76
5 Montana 44-76 60-60 70-76
6 Idaho 48-56 60-60 76-76
7 Idaho 48-72 60-60 70-76