r语言 - dplyr::完成/填充时间序列,但仅限于有限的时间段



我正在尝试使用 dplyr::completefill来填补动物体重时间序列中的空白(大部分时间大约每周称重(,但我只想在一定范围内这样做。

在以下示例数据集中,缺少多个日期:2020 年 1 月 29 日的单个称重和 3 月/4 月的一系列 4 周缺失。我们可以错过 1 周的称重(例如 1/29(,并且可以"填充"原始重量两周,但不想再进一步了。第二组缺失数据应该只填充 13 天,然后其余的空白应该是 NA 对于wt_g。

library(tidyverse)
library(lubridate)
animalwts <- tibble::tribble(
~Animal,     ~WtDate, ~Wt_g,
"A",  "1/1/2020",   20L,
"A",  "1/8/2020",   21L,
"A", "1/15/2020",   21L,
"A", "1/22/2020",   23L,
"A",  "2/5/2020",   25L,
"A", "2/12/2020",   23L,
"A", "2/19/2020",   24L,
"A", "2/26/2020",   23L,
"A",  "3/4/2020",   22L,
"A",  "4/8/2020",   24L
) %>%
mutate(WtDate = mdy(WtDate))

以下代码用于完成日期序列并填写所有缺失的数据

animalwts %>%
group_by(Animal) %>%
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g) 

但我试图弄清楚如何complete所有日期,但从任何给定日期开始最多只fill两周的权重,并为任何进一步丢失的数据输入 NA。

如果可能的话,我想留在"管道中"。

像这样?

library(tidyverse)
library(lubridate)
animalwts %>%
group_by(Animal) %>%
mutate(NA_lag = WtDate - lag(WtDate),
last_measurement_date = WtDate) %>% 
complete(WtDate = seq.Date(min(WtDate), max(WtDate), by = "day")) %>%
fill(Wt_g) %>% 
fill(last_measurement_date) %>% 
group_by(last_measurement_date, NA_lag) %>% 
mutate(days_missing = row_number()) %>% 
mutate(Wt_g = if_else(days_missing > 14, NA_integer_, Wt_g))

数据

animalwts <- tibble::tribble(
~Animal,     ~WtDate, ~Wt_g,
"A",  "1/1/2020",   20L,
"A",  "1/8/2020",   21L,
"A", "1/15/2020",   21L,
"A", "1/22/2020",   23L,
"A",  "2/5/2020",   25L,
"A", "2/12/2020",   23L,
"A", "2/19/2020",   24L,
"A", "2/26/2020",   23L,
"A",  "3/4/2020",   22L,
"A",  "4/8/2020",   24L
) %>%
mutate(WtDate = mdy(WtDate))

最新更新