r语言 - Dplyr按名称分组,按日期从最近n个事件开始滚动平均值



我想创建一个人(名字(最近 3 个事件的滚动平均值。 我有一个日期,我想使用3个事件中最近的一个。有些人可能比其他人少DF,这没关系。

用于创建数据帧的代码:

library(dplyr)
# Create DataFrame
df<- data.frame(name=c('CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE','CAREY.FAKE',
'JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH','JOHN.SMITH',
'JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON','JEFF.JOHNSON',
'SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON','SARA.JOHNSON'
),
GA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
SV=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
GF=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
SA=c(2,2,2,2,2,20,2,2,2,2,2,20,2,2,2,20,2,2,2,20),
date=c("10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016","10/16/2016","10/15/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016",
"10/20/2016","10/19/2016","10/18/2016","10/17/2016"
),
stringsAsFactors = FALSE)

DF:

name        GA  SV  GF  SA  date
CAREY.FAKE  2   2   2   2   10/20/2016
CAREY.FAKE  2   2   2   2   10/19/2016
CAREY.FAKE  2   2   2   2   10/18/2016
CAREY.FAKE  2   2   2   2   10/17/2016
CAREY.FAKE  2   2   2   2   10/16/2016
CAREY.FAKE  20  20  20  20  10/15/2016
JOHN.SMITH  2   2   2   2   10/20/2016
JOHN.SMITH  2   2   2   2   10/19/2016
JOHN.SMITH  2   2   2   2   10/18/2016
JOHN.SMITH  2   2   2   2   10/17/2016
JOHN.SMITH  2   2   2   2   10/16/2016
JOHN.SMITH  20  20  20  20  10/15/2016
JEFF.JOHNS  2   2   2   2   10/20/2016
JEFF.JOHNS  2   2   2   2   10/19/2016
JEFF.JOHNS  2   2   2   2   10/18/2016
JEFF.JOHNS  20  20  20  20  10/17/2016
SARA.JOHNS  2   2   2   2   10/20/2016
SARA.JOHNS  2   2   2   2   10/19/2016
SARA.JOHNS  2   2   2   2   10/18/2016
SARA.JOHNS  20  20  20  20  10/17/2016

创建滚动平均值的代码:

df_next <- df %>%
group_by(name) %>%
summarise(last_three_mean = mean(tail(GA,SV,GF,SA, 3))

错误:

Error in summarise_impl(.data, dots) : 
Evaluation error: length(n) == 1L is not TRUE.

期望的结果:

name        GA  SV  GF  SA
CAREY.FAKE  2   2   2   2
JEFF.JOHNS  2   2   2   2
JOHN.SMITH  2   2   2   2
SARA.JOHNS  2   2   2   2

我们可以按"日期"arrange,然后在按"名称"分组后使用summarise_at获取多列的mean

library(dplyr)
library(lubridate)
df %>% 
group_by(name) %>%
arrange(name, mdy(date)) %>% 
summarise_at(2:5, funs(mean(tail(., 3))))
#or select the column by matching the name pattern
#summarise_at(vars(matches("^[A-Z]{2}$")), funs(mean(tail(., 3))))  
# A tibble: 4 x 5
#  name            GA    SV    GF    SA
#  <chr>        <dbl> <dbl> <dbl> <dbl>
#1 CAREY.FAKE       2     2     2     2
#2 JEFF.JOHNSON     2     2     2     2
#3 JOHN.SMITH       2     2     2     2
#4 SARA.JOHNSON     2     2     2     2

或者另一种选择是利用top_n,然后执行summarise_at

df %>% 
group_by(name) %>%
top_n(mdy(date), n = 3) %>%
summarise_at(2:5, mean)

相关内容

  • 没有找到相关文章

最新更新