如果我认为有一些问题数据,并且我想删除所有具有<0数据,我该怎么办?
fruit year price
apple 2021 2
apple 2020 -9
apple 2019 3
banana 2021 9
banana 2020 7
banana 2019 5
orange 2021 7
orange 2020 2
orange 2019 -3
->
fruit year price
banana 2021 9
banana 2020 7
banana 2019 5
有几种可能的解决方案,这里有三种:
基数R
dat[!dat$fruit %in% unique(dat[dat$price < 0, "fruit"]),]
dplyr
带all
:
library(dplyr)
dat %>%
group_by(fruit) %>%
filter(all(price >= 0))
或者,使用any
:
dat %>%
group_by(fruit) %>%
filter(!any(price < 0))
输出
# A tibble: 3 x 3
# Groups: fruit [1]
fruit year price
<chr> <int> <int>
1 banana 2021 9
2 banana 2020 7
3 banana 2019 5
首先您的数据df
:
fruit year price
1 apple 2021 2
2 apple 2020 -9
3 apple 2019 3
4 banana 2021 9
5 banana 2020 7
6 banana 2019 5
7 orange 2021 7
8 orange 2020 2
9 orange 2019 -3
您可以使用以下代码删除具有负price
:的每个组的所有行
df <- df[with(df, ave(price >= 0, fruit, FUN = all)), ]
df
输出:
fruit year price
4 banana 2021 9
5 banana 2020 7
6 banana 2019 5
如您所见,banana
没有负值。
数据
df <- data.frame(fruit = c("apple", "apple", "apple", "banana", "banana", "banana", "orange", "orange", "orange"),
year = c(2021, 2020, 2019, 2021, 2020, 2019, 2021, 2020, 2019),
price = c(2, -9, 3, 9, 7, 5, 7, 2, -3))