我有一个数据帧,如下所示:
person year location rank
Harry 2002 Los Angeles 1
Harry 2006 Boston 1
Harry 2006 Los Angeles 2
Harry 2006 Chicago 3
Peter 2001 New York 1
Peter 2002 New York 1
Lily 2005 Springfield 1
Lily 2007 New York 1
Lily 2008 Boston 1
Lily 2011 Chicago 1
Lily 2011 New York 2
Sam 2005 Springfield 1
Sam 2007 New York 1
Sam 2008 Boston 1
Sam 2008 Springfield 2
Sam 2008 New York 3
Sam 2011 Chicago 1
Sam 2011 Springfield 2
我想知道在个人级别,谁在某一年有一个排名为1的位置,并且这个位置在下一个可用的年份再次出现,但排名=1.例如,输出应该看起来像:
person yes/no
Harry 1
Peter 0
Lily 0
Sam 1
这里有一个dplyr
的方法,可能更简洁。
library(dplyr)
df1 %>%
# define year_number as a count of unique years [assumes sorted already]
group_by(person) %>%
mutate(year_num = cumsum(year != lag(year, default = 0))) %>%
# check for successive years with different ranks
group_by(person, location) %>%
mutate(next_yr_switch = year_num == lag(year_num, default = -Inf) + 1 & rank != lag(rank)) %>%
group_by(person) %>%
summarize(`yes/no` = sum(next_yr_switch))
## A tibble: 4 x 2
# person `yes/no`
#* <chr> <int>
#1 Harry 1
#2 Lily 0
#3 Peter 0
#4 Sam 1