我有一个两列的表,我想展开它。我知道这是一个非常受欢迎和探索的话题,然而,我尝试了几种方法,都没有得到我想要的。欢迎任何建议和投诉。
我的桌子上摆满了三位女性的数据。总共有5个类别,通常每个类别都有值。但一些女性的数据缺失,这导致了整行的缺失——请注意,Jane
遗漏了关于weight
的信息。
a = data.frame(categories = c("name", "sex", "age", "weight", "high",
"name", "sex", "age", "high",
"name", "sex", "age", "weight", "high"),
values = c("Emma", "female", "32", "72", "175",
"Jane", "female", "28", "165",
"Emma", "female", "42", "63", "170"))
categories values
1 name Emma
2 sex female
3 age 32
4 weight 72
5 high 175
6 name Jane
7 sex female
8 age 28
9 high 165
10 name Emma
11 sex female
12 age 42
13 weight 63
14 high 170
我想从categories
-列和values
-行中获得。但主要有两个问题:
1( 密钥是共享的-两个Emma(因此我不能使用spread
或reshape
(
2( 一些类别可能会丢失,比如Jane的体重(因此我不能使用pivot
或split
(
最后,我想重塑数据,得到一个这样的表:
name sex age weight high
Emma female 32 72 175
Jane female 28 NA 165
Emma female 42 63 170
假设'name'
始终存在于每个条目中,我们可以创建一个标识符列,并使用pivot_wider
进行整形。
library(dplyr)
a %>%
group_by(grp = cumsum(categories == 'name')) %>%
tidyr::pivot_wider(names_from = categories, values_from = values) %>%
ungroup %>%
select(-grp)
# name sex age weight high
# <chr> <chr> <chr> <chr> <chr>
#1 Emma female 32 72 175
#2 Jane female 28 NA 165
#3 Emma female 42 63 170
data.table
:中的相同逻辑
library(data.table)
dcast(setDT(a), cumsum(categories == 'name')~categories, value.var = 'values')
假设所有条目都以name
开始,并在R基地用magrittr
进行清洁:
library(magrittr)
split(a, cumsum(a$categories == "name")) %>%
lapply(function(x) setNames(x[[2L]], x[[1L]])[unique(a$categories)]) %>%
do.call(rbind, .) %>%
data.frame()
name sex age weight high
1 Emma female 32 72 175
2 Jane female 28 <NA> 165
3 Emma female 42 63 170
类似地玩data.table
:
library(data.table)
split(a, cumsum(a$categories == "name")) %>%
lapply(transpose, make.names = "categories") %>%
rbindlist(fill = TRUE)