r-如何最好地将季度数据转换为月度数据



下面是示例数据。我以这样的形式接收数据。每一行是一个季度,然后月份是其中的列。试图逐月进行计算,但我想我会为此转换数据帧。我想我应该做一个pivot_looker,但在网上没有看到任何类似的内容。以下是所需的结果

year<-c(2018,2018,2018,2018,2019,2019,2019,2019,2020,2020,2020,2020)
qtr<-c(1,2,3,4,1,2,3,4,1,2,3,4)
avgemp <-c(3,5,7,9,11,13,15,17,19,21,23,25)
month1emp<-c(2,4,6,8,10,12,14,16,18,20,22,24)
month2emp<-c(3,5,7,9,11,13,15,17,19,21,23,25)
month3emp<-c(4,6,8,10,12,14,16,18,20,22,24,26)
sample<-data.frame(year,qtr,month1emp,month2emp,month3emp)

Desired Result 
year   qtr   month   employment
2018    1       1           2
2018    1       2           3
2018    1       3           4
2018    2       4           4
2018    2       4           5
2018    2       4           6

以此类推。到2019年,月份值将重新启动,从1变为12。

我们可以在'month'列上使用pivot_longer,指定names_pattern来捕获数字((\d+)(,然后是'month'emp.value

library(dplyr)
library(tidyr)
sample %>% 
pivot_longer(cols = starts_with('month'), 
names_to = c("month", ".value"), names_pattern  = ".*(\d+)(emp)")%>% 
rename(employment = emp)

-输出

# A tibble: 36 x 4
year   qtr month employment
<dbl> <dbl> <chr>      <dbl>
1  2018     1 1              2
2  2018     1 2              3
3  2018     1 3              4
4  2018     2 1              4
5  2018     2 2              5
6  2018     2 3              6
7  2018     3 1              6
8  2018     3 2              7
9  2018     3 3              8
10  2018     4 1              8
# … with 26 more rows

如果我们需要根据"qtr"值增加"月">

sample %>% 
pivot_longer(cols = starts_with('month'), 
names_to = c("month", ".value"), names_pattern  = ".*(\d+)(emp)")%>% 
rename(employment = emp) %>% 
mutate(month = as.integer(month) + c(0, 3, 6, 9)[qtr])
# A tibble: 36 x 4
year   qtr month employment
<dbl> <dbl> <dbl>      <dbl>
1  2018     1     1          2
2  2018     1     2          3
3  2018     1     3          4
4  2018     2     4          4
5  2018     2     5          5
6  2018     2     6          6
7  2018     3     7          6
8  2018     3     8          7
9  2018     3     9          8
10  2018     4    10          8
# … with 26 more rows

基本R解决方案:

# Create a vector of boolean values,
# denoting whether or not the columns should
# be unpivoted: unpivot_cols => boolean vector
unpivot_cols <- startsWith(
names(df), 
"month"
)
# Reshape the data.frame, calculate 
# the month value: rshpd_df => data.frame
rshpd_df <- transform(
reshape(
df,
direction = "long",
varying = names(df)[unpivot_cols],
ids = NULL,
timevar = "month",
times = seq_len(sum(unpivot_cols)),
v.names = "employment",
new.row.names = seq_len(
nrow(df) * ncol(df)
)
),
month = ((12 / 4) * (qtr - 1)) + month
)
# Order the data.frame by year and month: 
# ordered_df => data.frame
ordered_df <- with(
rshpd_df, 
rshpd_df[order(year, month),]  
)

最新更新