首要问题:
我有我的数据标题
dat <- tibble(var1 = c("rnw wnd", "rnw wat"),
var2 = c("elc", NA))
我有另一个数据集,它是一组模式匹配的规则,这样,如果var1, var2与combine_rule == T, grp被分配。
patterns <- tibble(var1_patterns = c("rnw", "wnd", NA),
var2_patterns = c("elc", NA, "elc"),
combine_rule = c("&", NA, NA),
grp = c("elc_rnw", "wnd", "elc"))
我想在dat
上附加一个列表列,其中包含var1, var2组合满足规则的所有grp。
所以结果是:
dat <- tibble(var1 = c("rnw wnd", "rnw wat"),
var2 = c("elc", NA),
grp = c(list(c("elc_rnw", "wnd", "elc")),
list(NA))
)
<<p>简单问题/strong>这就是全部的问题,我意识到这是相当多的。在第一个实例中,了解如何映射str_match(var1, var1_pattern)来创建一个列表列,忽略var1和var2之间的逻辑关系是很有帮助的。所以结果是:
dat <- tibble(var1 = c("rnw wnd", "rnw wat"),
var2 = c("elc", NA),
grp = c(list( c("elc_rnw", "wnd")),
list("elc_rnw"))
)
我想到了映射str_match
,
dat %>%
mutate(grp = map(var1, ~str_match(.x, pattern$var1_pattern))
创建一个新的列表列。但我不知道如何映射超过pattern
的行来创建一个列表列。有一个循环选项,但我正在努力避免它!
我还应该补充,pattern
和dat
将是函数的参数,所以我(认为我)不能使用case_when进行模式匹配。
对于简单问题或总体问题的任何建议都将不胜感激。
(抱歉,如果这是重复的,但我没有发现问题,也许是因为我没有适当地表达问题)
我认为我已经找到了解决逻辑"&"特定情况下的总体问题的解决方案。在模式。
如果有人有更优雅的方法建议,或者任何其他建议,我们将非常欢迎。
# Create dat and patterns
dat <- tibble::tibble(var1 = c("rnw wnd", "rnw wat"),
var2 = c("elc", NA))
patterns <- tibble::tibble(var1_patterns = c("rnw", "wnd", NA),
var2_patterns = c("elc", NA, "elc"),
combine_rule = c("&", NA, NA),
grp = c("elc_rnw", "wnd", "elc"))
# function to return logical matches if pattern detected
check_for_matches <- function(var, patterns){
out <- stringr::str_detect(var, patterns)
# if var is missing, we want to return F for all matches
if(is.na(var)){
out <- replace(out, 1:length(out), F)
}
out
}
dat %>%
#create logicals for detection of var1 and var2 seperately
dplyr::mutate(var1_check = purrr::map(var1,
~check_for_matches(.x,
patterns$var1_patterns))) %>%
dplyr::mutate(var2_check = purrr::map(var2,
~check_for_matches(.x,
patterns$var2_patterns))) %>%
#append the group column
dplyr::mutate(grp = list(patterns$grp)) %>%
# unnest because we are working across columns
tidyr::unnest(c(var1_check, var2_check, grp)) %>%
#create logical for joining var1_check and var2_check and accounting for NAs
dplyr::mutate(joined = dplyr::if_else(is.na(var1_check & var2_check) == F,
var1_check & var2_check,
dplyr::if_else(is.na(var1_check), var2_check,
var1_check)),
# if joined T, then grp
grp = dplyr::if_else(joined, grp, NA_character_)) %>%
dplyr::select(var1, var2, grp) %>%
tidyr::nest(data = c(grp))