将多个柱桩连接到两个柱上,并删除R中的重复项

  • 本文关键字:删除 两个 连接 r
  • 更新时间 :
  • 英文 :


我有多个列,但这里只是我的一部分数据:

df<-read.table (text=" Color1   Size1   Color2  Size2   Color3  Size3
Yellow  AA  Gray    GB  Purpul  MO
Blue    BD  Cyne    CE  Gray    GB
Yellow  AA  Yellow  AA  Black   LL
Red MD  Reddark KK  Reddark KK
Green   MC  Reddark KK  Green   MC
", header=TRUE)

我想删除所有列,将它们显示为两列,然后删除重复项以获得此表:

Color   Size
Yellow  AA
Blue    BD
Red MD
Green   MC
Gray    GB
Cyne    CE
Reddark KK
Purpul  MO
Black   LL

我试着用熔体重塑2,但我很难做到。

在没有其他库的情况下,整形和唯一可以完成任务:

> unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
Color Size
1.1  Yellow   AA
2.1    Blue   BD
4.1     Red   MD
5.1   Green   MC
1.2    Gray   GB
2.2    Cyne   CE
4.2 Reddark   KK
1.3  Purpul   MO
3.3   Black   LL

对我来说,旋转似乎有些过头了,但我知道什么。如果索引让你感到困扰(尽管它保存了宽表的结构信息(,那么重置行名:

> uniq = unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
> rownames(uniq) = NULL

使用pivot_longer()pivot_wider()的另一种方法可以是:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>%
pivot_longer(everything()) %>%
mutate(name=substr(name,1,nchar(name)-1)) %>%
group_by(name) %>% mutate(id2=row_number()) %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-id2) %>%
filter(!duplicated(paste(Color,Size)))

输出:

# A tibble: 9 x 2
Color   Size 
<fct>   <fct>
1 Yellow  AA   
2 Gray    GB   
3 Purpul  MO   
4 Blue    BD   
5 Cyne    CE   
6 Black   LL   
7 Red     MD   
8 Reddark KK   
9 Green   MC  

我们可以使用pivot_longertidyr在两列中从"宽"重塑为"长",方法是将names_sep指定为列名中字母和数字((?<=[a-z])(?=\d)(之间的边界,然后取两列的distinct

library(dplyr)
library(tidyr)
pivot_longer(df, cols = everything(),
names_to = c( '.value', 'grp'), names_sep="(?<=[a-z])(?=\d)") %>% 
distinct(Color, Size)

-输出

# A tibble: 9 x 2
#  Color   Size 
#  <chr>   <chr>
#1 Yellow  AA   
#2 Gray    GB   
#3 Purpul  MO   
#4 Blue    BD   
#5 Cyne    CE   
#6 Black   LL   
#7 Red     MD   
#8 Reddark KK   
#9 Green   MC   

或使用data.table

library(data.table)
unique(melt(setDT(df), measure = patterns('^Color', '^Size'),
value.name = c('Color', 'Size'))[, variable := NULL])
#     Color Size
#1:  Yellow   AA
#2:    Blue   BD
#3:     Red   MD
#4:   Green   MC
#5:    Gray   GB
#6:    Cyne   CE
#7: Reddark   KK
#8:  Purpul   MO
#9:   Black   LL

数据

df <- structure(list(Color1 = c("Yellow", "Blue", "Yellow", "Red", 
"Green"), Size1 = c("AA", "BD", "AA", "MD", "MC"), Color2 = c("Gray", 
"Cyne", "Yellow", "Reddark", "Reddark"), Size2 = c("GB", "CE", 
"AA", "KK", "KK"), Color3 = c("Purpul", "Gray", "Black", "Reddark", 
"Green"), Size3 = c("MO", "GB", "LL", "KK", "MC")), 
class = "data.frame", row.names = c(NA, 
-5L))

最新更新