当我在R代码中有NA数据时,生成变量的问题

  • 本文关键字:变量 问题 数据 NA 代码 r
  • 更新时间 :
  • 英文 :


我对下面的代码有一个问题。请注意,我的输入数据是day = 30/06Category = FDEDTT = Hol,我可以得到SPV(第一个代码(。但是,当我执行day = 30/06Category = ABCDTT = NA时,我无法获得SPV(第二个代码(。我需要显示与该日期/类别/dtt相对应的行。如何调整?

下面的可执行代码:

对于30/06,FDE,Hol

library(dplyr)
df1 <- structure(
list(date1= c("2021-06-28","2021-06-28","2021-06-28"),
date2 = c("2021-06-30","2021-06-30","2021-07-02"),
DTT= c(NA,"Hol","Hol"),
Week= c("Wednesday","Wednesday","Friday"),
Category = c("ABC","FDE","ABC"),
DR1 = c(4,1,1),
DR01 = c(4,1,2), DR02= c(4,2,0),DR03= c(9,5,0),
DR04 = c(5,4,3),DR05 = c(5,4,0)),
class = "data.frame", row.names = c(NA, -3L))

dmda<-"2021-06-30"
CategoryChosse<-"FDE"
DTest<-"Hol"
x<-df1 %>% select(starts_with("DR0"))

x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
PV<-select(x, date2,Week, Category, DTT, DR1, ends_with("PV"))

med<-PV %>%
group_by(Category,Week,DTT) %>%
summarize(across(ends_with("PV"), median))

SPV<-df1%>%
inner_join(med, by = c('Category', 'Week','DTT')) %>%
mutate(across(matches("^DR0\d+$"), ~.x + 
get(paste0(cur_column(), '_PV')),
.names = '{col}_{col}_PV')) %>%
select(date1:Category, DR01_DR01_PV:last_col())

SPV<-data.frame(SPV)

mat1 <- df1 %>%
filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
select(starts_with("DR0")) %>%
pivot_longer(cols = everything()) %>%
arrange(desc(row_number())) %>%
mutate(cs = cumsum(value)) %>%
filter(cs == 0) %>%
pull(name)

(dropnames <- paste0(mat1,"_",mat1, "_PV"))

SPV <- SPV %>%
filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
select(-any_of(dropnames))
if(length(grep("DR0", names(SPV))) == 0) {
SPV[mat1] <- NA_real_
}
> SPV
date1      date2 DTT      Week Category DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV DR04_DR04_PV DR05_DR05_PV
1 2021-06-28 2021-06-30 Hol Wednesday      FDE            1            1            1            1    

对于30/06,ABC,NA

dmda<-"2021-06-30"
CategoryChosse<-"ABC"
DTest<-NA
x<-df1 %>% select(starts_with("DR0"))
x<-cbind(df1, setNames(df1$DR1 - x, paste0(names(x), "_PV")))
PV<-select(x, date2,Week, Category, DTT, DR1, ends_with("PV"))
med<-PV %>%
group_by(Category,Week,DTT) %>%
summarize(across(ends_with("PV"), median))
SPV<-df1%>%
inner_join(med, by = c('Category', 'Week','DTT')) %>%
mutate(across(matches("^DR0\d+$"), ~.x + 
get(paste0(cur_column(), '_PV')),
.names = '{col}_{col}_PV')) %>%
select(date1:Category, DR01_DR01_PV:last_col())
SPV<-data.frame(SPV)
mat1 <- df1 %>%
filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
select(starts_with("DR0")) %>%
pivot_longer(cols = everything()) %>%
arrange(desc(row_number())) %>%
mutate(cs = cumsum(value)) %>%
filter(cs == 0) %>%
pull(name)
(dropnames <- paste0(mat1,"_",mat1, "_PV"))
SPV <- SPV %>%
filter(date2 == dmda, Category == CategoryChosse, DTT==DTest) %>%
select(-any_of(dropnames))
if(length(grep("DR0", names(SPV))) == 0) {
SPV[mat1] <- NA_real_
}
> SPV
[1] date1        date2        DTT          Week         Category     DR01_DR01_PV DR02_DR02_PV DR03_DR03_PV
[9] DR04_DR04_PV DR05_DR05_PV
<0 lines>

您可以编写自己的函数,它类似于==,除非一个元素是NA,在这种情况下,当两个元素都是NA时,只返回TRUE。然后在filter中使用该函数,而不是在==中使用。

将来,请尝试创建一个最小可复制的示例,如下面的示例,这样您就可以提出特定的问题,而不是要求人们为您修复代码。

请参阅如何制作一个伟大的R可复制示例

library(dplyr, warn.conflicts = FALSE)
same <- function(x, y){
case_when(
is.na(x) != is.na(y) ~ FALSE,
is.na(x) ~ TRUE,
TRUE ~ x == y)
}
df <- data.frame(x = c('hol', NA))
x_want <- 'hol'
df %>% 
filter(same(x, x_want))
#>     x
#> 1 hol
x_want <- NA
df %>% 
filter(same(x, x_want))
#>      x
#> 1 <NA>

创建于2021-12-20由reprex包(v2.0.1(

最新更新