用滞后值R填充多个NA



我正在尝试用成本列中的最新非NA值填充此数据帧中的NA值。我想按城市分组,所以奥马哈的所有NA应该是44.50,林肯的NA应该是62.50。这是我一直在使用的代码——它用正确的值替换了每组的第一个NA(April(,但没有超过这个值。

df <- df %>% 
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), lag(cost, na.rm=TRUE), cost))

运行代码前的数据:

year   month      city     cost
2021   January    Omaha     45.50  
2021   February   Omaha     46.75
2021   March      Omaha     44.50
2021   April      Omaha     NA
2021   May        Omaha     NA
2021   June       Omaha     NA
2021   January    Lincoln   55.25
2021   February   Lincoln   53.80
2021   March      Lincoln   62.50
2021   April      Lincoln   NA
2021   May        Lincoln   NA
2021   June       Lincoln   NA

使用:

library(tidyverse)
df %>% 
group_by(city) %>%
fill(cost)
# A tibble: 12 x 4
# Groups:   city [2]
year month    city     cost
<int> <chr>    <chr>   <dbl>
1  2021 January  Omaha    45.5
2  2021 February Omaha    46.8
3  2021 March    Omaha    44.5
4  2021 April    Omaha    44.5
5  2021 May      Omaha    44.5
6  2021 June     Omaha    44.5
7  2021 January  Lincoln  55.2
8  2021 February Lincoln  53.8
9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

对于您的代码,您可能希望使用last而不是lag(尽管fill是更好的选择(。我们还需要将cost封装在na.omit中。

library(tidyverse)
df %>%
group_by(city) %>%
mutate(cost = ifelse(is.na(cost), last(na.omit(cost)), cost))

输出

year month    city     cost
<int> <chr>    <chr>   <dbl>
1  2021 January  Omaha    45.5
2  2021 February Omaha    46.8
3  2021 March    Omaha    44.5
4  2021 April    Omaha    44.5
5  2021 May      Omaha    44.5
6  2021 June     Omaha    44.5
7  2021 January  Lincoln  55.2
8  2021 February Lincoln  53.8
9  2021 March    Lincoln  62.5
10  2021 April    Lincoln  62.5
11  2021 May      Lincoln  62.5
12  2021 June     Lincoln  62.5

数据

df <- structure(list(year = c(2021L, 2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L, 2021L, 2021L, 2021L), month = c("January", 
"February", "March", "April", "May", "June", "January", "February", 
"March", "April", "May", "June"), city = c("Omaha", "Omaha", 
"Omaha", "Omaha", "Omaha", "Omaha", "Lincoln", "Lincoln", "Lincoln", 
"Lincoln", "Lincoln", "Lincoln"), cost = c(45.5, 46.75, 44.5, 
NA, NA, NA, 55.25, 53.8, 62.5, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-12L))

最新更新