r-根据日期和预定义值有条件地更改新列-data.table



数据:

DT<-data.table::data.table(
ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L),
C_OPR = c("ABCD01", "ABCD11", NA, "EFGH", NA, NA, "KLMN", NA),
D_OPR = c(NA, NA, "PQRST", NA, "EFGHIJ", NA, NA, NA),
DATE = c("2007-07-07","2005-05-05","2002-02-02",
"2002-02-02","2004-04-04",NA,"2001-01-01",NA),
INDX_DATE = c("2006-06-06","2006-06-06","2006-06-06",
"2001-01-01","2001-01-01","2001-01-01","2005-05-05",
"2005-05-05")
)
ALFA_DEF<-c("ABCD","EFGH")

输出:

ID  C_OPR  D_OPR       DATE  INDX_DATE
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06
3:  1   <NA>  PQRST 2002-02-02 2006-06-06
4:  2   EFGH   <NA> 2002-02-02 2001-01-01
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01
6:  2   <NA>   <NA>       <NA> 2001-01-01
7:  3   KLMN   <NA> 2001-01-01 2005-05-05
8:  3   <NA>   <NA>       <NA> 2005-05-05

期望输出:

ID  C_OPR  D_OPR       DATE  INDX_DATE ALFA
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06    1
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06    1
3:  1   <NA>  PQRST 2002-02-02 2006-06-06    1
4:  2   EFGH   <NA> 2002-02-02 2001-01-01    0
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01    0
6:  2   <NA>   <NA>       <NA> 2001-01-01    0
7:  3   KLMN   <NA> 2001-01-01 2005-05-05    0
8:  3   <NA>   <NA>       <NA> 2005-05-05    0

逻辑:

在同一组(ID)中,任何包含ALFA_DEFC_OPRD_OPR,其中DATE小于INDX_DATE-,则生成ALFA=1,否则为0。

尝试不更正日期:

DT[, ALPHA := +any( grepl(paste0(ALPHA_DEF, collapse="|"),c(D_OPR, C_OPR)), by=ID]

最好是data.table解决方案,但也欢迎dplyr。

/H

我们可以作为这样做

library(data.table)
pat <- paste(ALFA_DEF, collapse = "|")
DT[, ALFA := +(any(Reduce(`|`, lapply(.SD, (x) 
grepl(pat, x[DATE < INDX_DATE]))))), by = ID, .SDcols = patterns("_OPR$")]

-输出

> DT
ID  C_OPR  D_OPR       DATE  INDX_DATE ALFA
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06    1
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06    1
3:  1   <NA>  PQRST 2002-02-02 2006-06-06    1
4:  2   EFGH   <NA> 2002-02-02 2001-01-01    0
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01    0
6:  2   <NA>   <NA>       <NA> 2001-01-01    0
7:  3   KLMN   <NA> 2001-01-01 2005-05-05    0
8:  3   <NA>   <NA>       <NA> 2005-05-05    0

我们可以这样做:

创建新列后ALFA我们检查a) 任何C_OPRD_OPR都包含ALFA_DEFb)DATE<INDX_DATE则CCD_ 14将是CCD_否则ALFA将是0

DT[, ALFA := ifelse(any(grepl(paste0(ALFA_DEF, collapse = "|"), c(D_OPR, C_OPR)) & as.Date(DATE) < as.Date(INDX_DATE)), 1, 0), by = ID]

输出:

ID  C_OPR  D_OPR       DATE  INDX_DATE ALFA
1:  1 ABCD01   <NA> 2007-07-07 2006-06-06    1
2:  1 ABCD11   <NA> 2005-05-05 2006-06-06    1
3:  1   <NA>  PQRST 2002-02-02 2006-06-06    1
4:  2   EFGH   <NA> 2002-02-02 2001-01-01    0
5:  2   <NA> EFGHIJ 2004-04-04 2001-01-01    0
6:  2   <NA>   <NA>       <NA> 2001-01-01    0
7:  3   KLMN   <NA> 2001-01-01 2005-05-05    0
8:  3   <NA>   <NA>       <NA> 2005-05-05    0

相关内容

  • 没有找到相关文章

最新更新