R数据帧按组检测隐藏的重复模式



我有一个数据帧,如下所示:

person   year   location     rank
Harry    2002   Los Angeles  1
Harry    2006   Boston       1
Harry    2006   Los Angeles  2
Harry    2006   Chicago      3
Peter    2001   New York     1
Peter    2002   New York     1
Lily     2005   Springfield  1
Lily     2007   New York     1
Lily     2008   Boston       1
Lily     2011   Chicago      1
Lily     2011   New York     2
Sam      2005   Springfield  1
Sam      2007   New York     1
Sam      2008   Boston       1
Sam      2008   Springfield  2
Sam      2008   New York     3
Sam      2011   Chicago      1
Sam      2011   Springfield  2

我想知道在个人级别,谁在某一年有一个排名为1的位置,并且这个位置在下一个可用的年份再次出现,但排名=1.例如,输出应该看起来像:

person   yes/no
Harry    1
Peter    0
Lily     0
Sam      1

这里有一个dplyr的方法,可能更简洁。

library(dplyr)
df1 %>%
# define year_number as a count of unique years [assumes sorted already]
group_by(person) %>%
mutate(year_num = cumsum(year != lag(year, default = 0))) %>%
# check for successive years with different ranks
group_by(person, location) %>%
mutate(next_yr_switch = year_num == lag(year_num, default = -Inf) + 1 & rank != lag(rank)) %>%
group_by(person) %>%
summarize(`yes/no` = sum(next_yr_switch))

## A tibble: 4 x 2
#  person `yes/no`
#* <chr>     <int>
#1 Harry         1
#2 Lily          0
#3 Peter         0
#4 Sam           1

最新更新