R: Spread data.frame/tibble with shared keys and missing dat



我有一个两列的表,我想展开它。我知道这是一个非常受欢迎和探索的话题,然而,我尝试了几种方法,都没有得到我想要的。欢迎任何建议和投诉。

我的桌子上摆满了三位女性的数据。总共有5个类别,通常每个类别都有值。但一些女性的数据缺失,这导致了整行的缺失——请注意,Jane遗漏了关于weight的信息。

a = data.frame(categories = c("name", "sex", "age", "weight", "high", 
"name", "sex", "age", "high", 
"name", "sex", "age", "weight", "high"),
values = c("Emma", "female", "32", "72", "175",
"Jane", "female", "28", "165",
"Emma", "female", "42", "63", "170")) 
categories values
1        name   Emma
2         sex   female
3         age     32
4      weight     72
5        high    175
6        name   Jane
7         sex female
8         age     28
9        high    165
10       name   Emma
11        sex female
12        age     42
13     weight     63
14       high    170

我想从categories-列和values-行中获得。但主要有两个问题:

1( 密钥是共享的-两个Emma(因此我不能使用spreadreshape(

2( 一些类别可能会丢失,比如Jane的体重(因此我不能使用pivotsplit(

最后,我想重塑数据,得到一个这样的表:

name  sex    age  weight  high
Emma  female 32   72      175
Jane  female 28   NA      165
Emma  female 42   63      170

假设'name'始终存在于每个条目中,我们可以创建一个标识符列,并使用pivot_wider进行整形。

library(dplyr)
a %>%
group_by(grp = cumsum(categories == 'name')) %>%
tidyr::pivot_wider(names_from = categories, values_from = values) %>%
ungroup %>%
select(-grp)
#  name  sex    age   weight high 
#  <chr> <chr>  <chr> <chr>  <chr>
#1 Emma  female 32    72     175  
#2 Jane  female 28    NA     165  
#3 Emma  female 42    63     170  

data.table:中的相同逻辑

library(data.table)
dcast(setDT(a), cumsum(categories == 'name')~categories, value.var = 'values')

假设所有条目都以name开始,并在R基地用magrittr进行清洁:

library(magrittr)
split(a, cumsum(a$categories == "name")) %>% 
lapply(function(x) setNames(x[[2L]], x[[1L]])[unique(a$categories)]) %>% 
do.call(rbind, .) %>% 
data.frame()
name    sex age weight high
1 Emma female  32     72  175
2 Jane female  28   <NA>  165
3 Emma female  42     63  170

类似地玩data.table:

library(data.table)
split(a, cumsum(a$categories == "name")) %>% 
lapply(transpose, make.names = "categories") %>% 
rbindlist(fill = TRUE)

最新更新