如何根据另一列是否为NA向上搜索一列的值?

  • 本文关键字:一列 搜索 NA 何根 是否 r
  • 更新时间 :
  • 英文 :


我需要找到值不是NA的前一个日期,然后还使用该行上的值。我尝试过使用shift,但我遇到了一个问题,因为shift对第9行工作得很好,但当类型上有连续的非nas时,例如在第5,6行。

dtihave = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
type = c(1,1,NA,NA,1,1,NA,NA,1),
value = c(7,NA,6,8,NA,NA,5,9,NA))
> dtihave
date type value
1: 2020-01-01    1     7
2: 2020-02-01    1    NA
3: 2020-03-03   NA     6
4: 2020-04-02   NA     8
5: 2020-05-09    1    NA
6: 2020-06-10    1    NA
7: 2020-07-18   NA     5
8: 2020-08-23   NA     9
9: 2020-09-09    1    NA
dtiwant = data.table(date = as.Date(c("2020-01-01", "2020-02-01", "2020-03-03", "2020-04-02", "2020-05-09", "2020-06-10", "2020-07-18", "2020-08-23", "2020-09-09")),
type = c(1,1,NA,NA,1,1,NA,NA,1),
value = c(7,NA,6,8,NA,NA,5,9,NA),
iwantdate = c(NA, as.Date("2020-01-01"), NA, NA, as.Date("2020-04-02"), as.Date("2020-04-02"), NA, NA, as.Date("2020-08-23")),
iwantvalue = c(NA,7,NA,NA,8,8,NA,NA,9))
dtiwant[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]
> dtiwant
date type value  iwantdate iwantvalue
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-04-02          8
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9

我目前的进度使用shift,但我需要第6行iwantdate = "2020-04-02"。我需要做的移位次数是未知的,所以我不能只用n=2来表示移位。

dtprogress = copy(dtihave)
dtprogress[, iwantdate := ifelse(!is.na(type) & is.na(value), shift(date), NA)]
dtprogress[, iwantdate := ifelse(!is.na(type) & !is.na(value), date, iwantdate)]
dtprogress[, iwantdate := as.Date(iwantdate, origin = "1970-01-01")]
> dtprogress
date type value  iwantdate
1: 2020-01-01    1     7 2020-01-01
2: 2020-02-01    1    NA 2020-01-01
3: 2020-03-03   NA     6       <NA>
4: 2020-04-02   NA     8       <NA>
5: 2020-05-09    1    NA 2020-04-02
6: 2020-06-10    1    NA 2020-05-09
7: 2020-07-18   NA     5       <NA>
8: 2020-08-23   NA     9       <NA>
9: 2020-09-09    1    NA 2020-08-23

你可以这样做:

dtihave[, idx := cummax((!is.na(value)) * .I) * NA^!is.na(value)][,
c('want_date', 'want_value') := lapply(.SD, '[', idx),
.SDcols = c('date', 'value')][, idx:=NULL]
dtihave
date type value  want_date want_value
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-04-02          8
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9

与tidyverse。希望这能解决分组问题。例如,在mutate之前加上%>%group_by(...),就可以了

dtihave %>%
mutate(val_na = !is.na(value),
idx = nafill(na_if(row_number() * val_na, 0), "locf"),
idx = idx * NA ^ val_na,
date1 = date[idx], value1 = value[idx],
val_na = NULL, idx = NULL)

您可以使用lag来获取先前的值,例如

library(dplyr)
dtihave %>% 
mutate(iwantdate = ifelse(is.na(value), lag(date), NA) %>% as.Date(., origin = "1970-01-01"), 
iwantvalue = ifelse(is.na(value), lag(value), NA)) 
date type value  iwantdate iwantvalue
1: 2020-01-01    1     7       <NA>         NA
2: 2020-02-01    1    NA 2020-01-01          7
3: 2020-03-03   NA     6       <NA>         NA
4: 2020-04-02   NA     8       <NA>         NA
5: 2020-05-09    1    NA 2020-04-02          8
6: 2020-06-10    1    NA 2020-05-09         NA
7: 2020-07-18   NA     5       <NA>         NA
8: 2020-08-23   NA     9       <NA>         NA
9: 2020-09-09    1    NA 2020-08-23          9

最新更新