RHS中有两列的r-dcast/spread字符纯数据帧



我正试图将df从长到宽地分布在RHS中的两列上。我有数据帧:

dat <- data.frame(
bm = LETTERS[seq(1,10)], 
at4 = c("a1","a2","a3","a4","a5","a6","a7","a8","a9", "a10"),
n1 = c("n1","n2","n3","n4","n5","n6","n7","n8","n9", "n10"),
cat = c("x","y","z","x","y","z","x","y","z","z"),
clas = c("k","l","k","l","k","l","k","l","k","l")) 

提供什么:

bm  at4 n1  cat clas
A   a1  n1  x   k
B   a2  n2  y   l
C   a3  n3  z   k
D   a4  n4  x   l
E   a5  n5  y   k
F   a6  n6  z   l
G   a7  n7  x   k
H   a8  n8  y   l
I   a9  n9  z   k
J   a10 n10 z   l

所需数据帧:

#The approach that works with adding dummy columns:
dat$temp1 <- 1
dat <- dcast(dat, ... ~ cat, value.var = "temp1", fun.aggregate = sum)
dat$temp1 <- 1
dat <- dcast(dat, ... ~ clas, value.var = "temp1", fun.aggregate = sum)
bm  at4 n1  x   y   z   k   l
A   a1  n1  1   0   0   1   0
B   a2  n2  0   1   0   0   1
C   a3  n3  0   0   1   1   0
D   a4  n4  1   0   0   0   1
E   a5  n5  0   1   0   1   0
F   a6  n6  0   0   1   0   1
G   a7  n7  1   0   0   1   0
H   a8  n8  0   1   0   0   1
I   a9  n9  0   0   1   1   0
J   a10 n10 0   0   1   0   1

我想知道它是否可以在dcast或spread中一次性完成。我试过了:

dat$temp1 <- 1
dcast(dat, ... ~ cat + clas, value.var = "temp1", fun.aggregate = sum)

但它给出了:

bm  at4 n1  x_k x_l y_k y_l z_k z_l
A   a1  n1  1   0   0   0   0   0
B   a2  n2  0   0   0   1   0   0
C   a3  n3  0   0   0   0   1   0
D   a4  n4  0   1   0   0   0   0
E   a5  n5  0   0   1   0   0   0
F   a6  n6  0   0   0   0   0   1
G   a7  n7  1   0   0   0   0   0
H   a8  n8  0   0   0   1   0   0
I   a9  n9  0   0   0   0   1   0
J   a10 n10 0   0   0   0   0   1

在强制转换后,每个单元格的bm列应该是唯一的,这样它就可以显示而不是1。尝试过类似的方法:

dcast(dat, ... ~ cat + clas, value.var = "bm", fun.aggregate = toString)

但是dcast将RHS柱混合在一起:

at4 n1  x_k x_l y_k y_l z_k z_l
a1  n1  A                   
a10 n10                     J
a2  n2              B       
a3  n3                  C   
a4  n4      D               
a5  n5          E           
a6  n6                      F
a7  n7  G                   
a8  n8              H       
a9  n9                  I   

我们可以melt&一次性dcast

library(data.table)
setDT(dat)
dcast(melt(dat, id = c("bm", "at4", "n1")), bm + at4 + n1 ~ value, fun = length)
#    bm at4  n1 k l x y z
# 1:  A  a1  n1 1 0 1 0 0
# 2:  B  a2  n2 0 1 0 1 0
# 3:  C  a3  n3 1 0 0 0 1
# 4:  D  a4  n4 0 1 1 0 0
# 5:  E  a5  n5 1 0 0 1 0
# 6:  F  a6  n6 0 1 0 0 1
# 7:  G  a7  n7 1 0 1 0 0
# 8:  H  a8  n8 0 1 0 1 0
# 9:  I  a9  n9 1 0 0 0 1
#10:  J a10 n10 0 1 0 0 1

在重新整形2包中有一个称为CCD_;将熔化和(d(将数据帧铸造成单个步骤&";,看见https://github.com/hadley/reshape/blob/master/R/recast.r

这里有一个解决方案,但有pivot_wider,而不是spread/dcast
catclas都重新整形为宽格式,然后将这两个结果连接起来。

suppressPackageStartupMessages({
library(dplyr)
library(tidyr)
})
dat <- data.frame(
bm = LETTERS[seq(1,10)], 
at4 = c("a1","a2","a3","a4","a5","a6","a7","a8","a9", "a10"),
n1 = c("n1","n2","n3","n4","n5","n6","n7","n8","n9", "n10"),
cat = c("x","y","z","x","y","z","x","y","z","z"),
clas = c("k","l","k","l","k","l","k","l","k","l")) 
inner_join(
dat %>%
mutate(new = 1) %>% 
pivot_wider(
id_cols = 1:3,
names_from = cat,
values_from = new,
values_fill = 0L
),
dat %>%
mutate(new = 1L) %>%
pivot_wider(
id_cols = 1:3,
names_from = clas,
values_from = new,
values_fill = 0L
), 
by = c("bm", "at4", "n1")
)
#> # A tibble: 10 × 8
#>    bm    at4   n1        x     y     z     k     l
#>    <chr> <chr> <chr> <dbl> <dbl> <dbl> <int> <int>
#>  1 A     a1    n1        1     0     0     1     0
#>  2 B     a2    n2        0     1     0     0     1
#>  3 C     a3    n3        0     0     1     1     0
#>  4 D     a4    n4        1     0     0     0     1
#>  5 E     a5    n5        0     1     0     1     0
#>  6 F     a6    n6        0     0     1     0     1
#>  7 G     a7    n7        1     0     0     1     0
#>  8 H     a8    n8        0     1     0     0     1
#>  9 I     a9    n9        0     0     1     1     0
#> 10 J     a10   n10       0     0     1     0     1

创建于2022-07-24由reprex包(v2.0.1(

最新更新