由于数据集是每个月累积的,因此我只想为50个状态中的每个状态保留每个月的最后一行。此处的示例数据集代码段是按名称排序的顶部。我需要tidyverse或dplyr的哪些功能才能获得它?
让我们使用这个dummy
数据看起来与您的数据类似,
dummy <- data.frame(
name = c("Alabama","Alabama","Alabama","Alabama","Alabama","Alabama"),
bla = c(1:6),
as_of_date = c("3/26/2020","3/31/2020","4/6/2020","4/13/2020","4/21/2020","4/28/2020"),
month = c(3,3,4,4,4,4)
)
name bla as_of_date month
1 Alabama 1 3/26/2020 3
2 Alabama 2 3/31/2020 3
3 Alabama 3 4/6/2020 4
4 Alabama 4 4/13/2020 4
5 Alabama 5 4/21/2020 4
6 Alabama 6 4/28/2020 4
你可以试试,
library(dplyr)
dummy %>%
mutate(as_of_date = as.Date(as_of_date, format = "%m/%d/%Y")) %>%
arrange(name, as_of_date) %>% # to order by date, state, in case that your data is not ordered as an image
group_by(name, month) %>%
filter(row_number() == n())
name bla as_of_date month
<chr> <int> <date> <dbl>
1 Alabama 2 2020-03-31 3
2 Alabama 6 2020-04-28 4