我正在尝试用成本列中的最新非NA值填充此数据帧中的NA值。我想按城市分组,所以奥马哈的所有NA应该是44.50,林肯的NA应该是62.50。这是我一直在使用的代码——它用正确的值替换了每组的第一个NA(April(,但没有超过这个值。
df <- df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))
运行代码前的数据:
year month city cost
2021 January Omaha 45.50
2021 February Omaha 46.75
2021 March Omaha 44.50
2021 April Omaha NA
2021 May Omaha NA
2021 June Omaha NA
2021 January Lincoln 55.25
2021 February Lincoln 53.80
2021 March Lincoln 62.50
2021 April Lincoln NA
2021 May Lincoln NA
2021 June Lincoln NA
使用:
library(tidyverse)
df %>%
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups: city [2]
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
对于您的代码,您可能希望使用last
而不是lag
(尽管fill
是更好的选择(。我们还需要将cost
封装在na.omit
中。
library(tidyverse)
df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))
输出
year month city cost
<int> <chr> <chr> <dbl>
1 2021 January Omaha 45.5
2 2021 February Omaha 46.8
3 2021 March Omaha 44.5
4 2021 April Omaha 44.5
5 2021 May Omaha 44.5
6 2021 June Omaha 44.5
7 2021 January Lincoln 55.2
8 2021 February Lincoln 53.8
9 2021 March Lincoln 62.5
10 2021 April Lincoln 62.5
11 2021 May Lincoln 62.5
12 2021 June Lincoln 62.5
数据
df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L,
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January",
"February", "March", "April", "May", "June", "January", "February",
"March", "April", "May", "June"), city = c("Omaha", "Omaha",
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln",
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5,
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-12L))