我有多个列,但这里只是我的一部分数据:
df<-read.table (text=" Color1 Size1 Color2 Size2 Color3 Size3
Yellow AA Gray GB Purpul MO
Blue BD Cyne CE Gray GB
Yellow AA Yellow AA Black LL
Red MD Reddark KK Reddark KK
Green MC Reddark KK Green MC
", header=TRUE)
我想删除所有列,将它们显示为两列,然后删除重复项以获得此表:
Color Size
Yellow AA
Blue BD
Red MD
Green MC
Gray GB
Cyne CE
Reddark KK
Purpul MO
Black LL
我试着用熔体重塑2,但我很难做到。
在没有其他库的情况下,整形和唯一可以完成任务:
> unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
Color Size
1.1 Yellow AA
2.1 Blue BD
4.1 Red MD
5.1 Green MC
1.2 Gray GB
2.2 Cyne CE
4.2 Reddark KK
1.3 Purpul MO
3.3 Black LL
对我来说,旋转似乎有些过头了,但我知道什么。如果索引让你感到困扰(尽管它保存了宽表的结构信息(,那么重置行名:
> uniq = unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
> rownames(uniq) = NULL
使用pivot_longer()
和pivot_wider()
的另一种方法可以是:
library(dplyr)
library(tidyr)
#Code
newdf <- df %>%
pivot_longer(everything()) %>%
mutate(name=substr(name,1,nchar(name)-1)) %>%
group_by(name) %>% mutate(id2=row_number()) %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-id2) %>%
filter(!duplicated(paste(Color,Size)))
输出:
# A tibble: 9 x 2
Color Size
<fct> <fct>
1 Yellow AA
2 Gray GB
3 Purpul MO
4 Blue BD
5 Cyne CE
6 Black LL
7 Red MD
8 Reddark KK
9 Green MC
我们可以使用pivot_longer
从tidyr
在两列中从"宽"重塑为"长",方法是将names_sep
指定为列名中字母和数字((?<=[a-z])(?=\d)
(之间的边界,然后取两列的distinct
library(dplyr)
library(tidyr)
pivot_longer(df, cols = everything(),
names_to = c( '.value', 'grp'), names_sep="(?<=[a-z])(?=\d)") %>%
distinct(Color, Size)
-输出
# A tibble: 9 x 2
# Color Size
# <chr> <chr>
#1 Yellow AA
#2 Gray GB
#3 Purpul MO
#4 Blue BD
#5 Cyne CE
#6 Black LL
#7 Red MD
#8 Reddark KK
#9 Green MC
或使用data.table
library(data.table)
unique(melt(setDT(df), measure = patterns('^Color', '^Size'),
value.name = c('Color', 'Size'))[, variable := NULL])
# Color Size
#1: Yellow AA
#2: Blue BD
#3: Red MD
#4: Green MC
#5: Gray GB
#6: Cyne CE
#7: Reddark KK
#8: Purpul MO
#9: Black LL
数据
df <- structure(list(Color1 = c("Yellow", "Blue", "Yellow", "Red",
"Green"), Size1 = c("AA", "BD", "AA", "MD", "MC"), Color2 = c("Gray",
"Cyne", "Yellow", "Reddark", "Reddark"), Size2 = c("GB", "CE",
"AA", "KK", "KK"), Color3 = c("Purpul", "Gray", "Black", "Reddark",
"Green"), Size3 = c("MO", "GB", "LL", "KK", "MC")),
class = "data.frame", row.names = c(NA,
-5L))