分布在 R - dplyr tidyr 解决方案中的多个列上 - Spread over multiple columns in R

我有长格式的月度同比数据，我试图用两列spread。我见过的唯一例子包括一个key。

> dput(df)
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a", 
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), Year = c(2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L, 2015L, 
2015L, 2015L, 2016L, 2016L, 2016L, 2017L, 2017L, 2017L), Month = c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L), Value = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 6L, 7L, 8L, 
9L, 10L, 11L, 12L, 13L, 14L)), .Names = c("ID", "Year", "Month", 
"Value"), class = "data.frame", row.names = c(NA, -18L))

我正在尝试将其转换为数据格式，其中年份为第 2：5 列，每month每ID一行

ID  Month   2015    2016    2017
a   1         1     2       3
a   2         1     2       3
a   3         1     2       3
a   1         6     9       12
a   2         7     10      13
a   3         8     11      14

我尝试了以下错误：

by_month_over_years = spread(df,key = c(Year,Month), Value)
Error: `var` must evaluate to a single number or a column name, not an integer vector

library(tidyr)
library(dplyr)
df %>% group_by(ID) %>% spread(Year, Value)
# A tibble: 6 x 5
# Groups:   ID [2]
ID    Month `2015` `2016` `2017`
<chr> <int>  <int>  <int>  <int>
1 a         1      1      2      3
2 a         2      1      2      3
3 a         3      1      2      3
4 b         1      6      9     12
5 b         2      7     10     13
6 b         3      8     11     14

library(reshape2) # or data.table, for dcast
dcast(df, ID + Month ~ Year)
#   ID Month 2015 2016 2017
# 1  a     1    1    2    3
# 2  a     2    1    2    3
# 3  a     3    1    2    3
# 4  b     1    6    9   12
# 5  b     2    7   10   13
# 6  b     3    8   11   14

这是一个带有reshape的base R选项

reshape(df, idvar = c('ID', 'Month'), direction = 'wide', timevar = 'Year')
#   ID Month Value.2015 Value.2016 Value.2017
#1   a     1          1          2          3
#2   a     2          1          2          3
#3   a     3          1          2          3
#10  b     1          6          9         12
#11  b     2          7         10         13
#12  b     3          8         11         14

分布在 R - dplyr tidyr 解决方案中的多个列上

相关内容

最新更新

热门标签：