我有这个df:
library(lubridate)
Date <- c("2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04",
"2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04",
"2020-10-01", "2020-10-02", "2020-10-03", "2020-10-04")
Date <- as_date(Date)
Country <- c("USA", "USA", "USA", "USA",
"Mexico", "Mexico", "Mexico", "Mexico",
"Japan", "Japan", "Japan","Japan")
Value_A <- c(0,40,0,0,25,29,34,0,20,25,27,0)
df<- data.frame(Date, Country, Value_A)
view(df)
Date Country Value_A
<date> <chr> <dbl>
1 2020-10-01 USA 0
2 2020-10-02 USA 40
3 2020-10-03 USA 0
4 2020-10-04 USA 0
5 2020-10-01 Mexico 25
6 2020-10-02 Mexico 29
7 2020-10-03 Mexico 34
8 2020-10-04 Mexico 0
9 2020-10-01 Japan 20
10 2020-10-02 Japan 25
11 2020-10-03 Japan 27
12 2020-10-04 Japan 0
我试图删除包含零的行,但前提是这些零位于Country列每组的最后两行。因此,结果将是:
Date Country Value_A
<date> <chr> <dbl>
1 2020-10-01 USA 0
2 2020-10-02 USA 40
5 2020-10-01 Mexico 25
6 2020-10-02 Mexico 29
7 2020-10-03 Mexico 34
9 2020-10-01 Japan 20
10 2020-10-02 Japan 25
11 2020-10-03 Japan 27
如果有人能帮忙,我很感激:(
我们可以使用tidyverse
包进行一些操作以获得结果。我们使用group_by
Country,并按Date
降序排序。之后,我们生成row_number
s。最后,我们根据您描述的条件进行过滤:
library(tidyverse)
df %>%
group_by(Country) %>%
arrange(desc(Date)) %>%
mutate(rn = row_number()) %>%
filter(!(Value_A == 0 & rn <= 2))
# Date Country Value_A rn
# 1 2020-10-03 Mexico 34 2
# 2 2020-10-03 Japan 27 2
# 3 2020-10-02 USA 40 3
# 4 2020-10-02 Mexico 29 3
# 5 2020-10-02 Japan 25 3
# 6 2020-10-01 USA 0 4
# 7 2020-10-01 Mexico 25 4
# 8 2020-10-01 Japan 20 4
另一种方法是使用rank(desc(Date))
library(tidyverse)
df %>%
group_by(Country) %>%
mutate(rank_date = rank(desc(Date))) %>%
filter(!(rank_date <= 2 & Value_A == 0))
# Date Country Value_A rank_date
# 1 2020-10-01 USA 0 4
# 2 2020-10-02 USA 40 3
# 3 2020-10-01 Mexico 25 4
# 4 2020-10-02 Mexico 29 3
# 5 2020-10-03 Mexico 34 2
# 6 2020-10-01 Japan 20 4
# 7 2020-10-02 Japan 25 3
# 8 2020-10-03 Japan 27 2