如何保持R中每50个州的每个日期的最高值



由于数据集是每个月累积的,因此我只想为50个状态中的每个状态保留每个月的最后一行。此处的示例数据集代码段是按名称排序的顶部。我需要tidyverse或dplyr的哪些功能才能获得它?

让我们使用这个dummy数据看起来与您的数据类似,

dummy <- data.frame(
name = c("Alabama","Alabama","Alabama","Alabama","Alabama","Alabama"),
bla = c(1:6),
as_of_date = c("3/26/2020","3/31/2020","4/6/2020","4/13/2020","4/21/2020","4/28/2020"),
month = c(3,3,4,4,4,4)
)
name bla as_of_date month
1 Alabama   1  3/26/2020     3
2 Alabama   2  3/31/2020     3
3 Alabama   3   4/6/2020     4
4 Alabama   4  4/13/2020     4
5 Alabama   5  4/21/2020     4
6 Alabama   6  4/28/2020     4

你可以试试,

library(dplyr)
dummy %>%
mutate(as_of_date = as.Date(as_of_date, format = "%m/%d/%Y")) %>%
arrange(name, as_of_date) %>% # to order by date, state, in case that your data is not ordered as an image
group_by(name, month) %>%
filter(row_number() == n())
name      bla as_of_date month
<chr>   <int> <date>     <dbl>
1 Alabama     2 2020-03-31     3
2 Alabama     6 2020-04-28     4

最新更新