r语言 - 将字符串匹配转换为二进制元素的向量



>感谢您为以下问题建议包或基本 R 解决方案的帮助(提前感谢您。

想象一下,我有一个来自statistical_function(下图)的字符元素矢量。如果我提供两个字符元素的名称(例如,provided = c("high", "aware")),那么我想要一种机制来为我生成以下二进制向量:desired_vector = c(0,1,1,0,1)

长度:desired_vector的长度与统计函数输出中的元素数减去名为intrcpt的元素的长度相同。因此,在这种情况下,desired_vector将具有5元素。

元素A:对于statistical_function输出中不包含:的每个元素(例如,"weekhigh"),但确实包含provided向量的元素之一("high"),我想要的向量应该是1

元素B:对于统计函数输出中确实包含:(例如,"weekhigh:testeraware")并且确实包含provided向量的元素("high""aware")的每个元素,我想要的向量应该是1

否则,desired_vector中的所有剩余元素都应0。这有可能在R中实现吗?

在下面的示例中,desired_vector的第一个元素0因为intrcptweekssome不包含"high""aware",第二个元素1,因为weekshigh"high",第三个元素1,因为"testeraware"包含"aware",第四个元素0,因为"weekssome:testeraware"不包含"high""aware", 第 5 个元素是1,因为它确实包含"high""aware".

statistical_function = c("intrcpt","weekssome","weekshigh",            
"testeraware","weekssome:testeraware","weekshigh:testeraware")
# [1] "intrcpt"               "weekssome"             "weekshigh"            
# [4] "testeraware"           "weekssome:testeraware" "weekshigh:testeraware"
provided_vector = c("high", "aware")
desired_vector = c(0, 1, 1, 0, 1)

你可以试试下面的代码

+(
abs(
grepl(":", statistical_function) -
rowSums(
sapply(provided_vector, grepl, statistical_function)
)
) == 1)[
statistical_function != "intrcpt"
]

这给了

[1] 0 1 1 0 1

我不确定下面的方法是否是您正在寻找的。我不确定你是否总是想删除第一个元素。为了对删除的内容进行一些控制,我添加了drop参数。它要么需要要删除的元素的编号,要么需要一个字符串,其中包含要删除的元素的名称。它默认为drop = "intrcpt",这将丢弃截距。

# the input vector containing the coefficient names
statistical_function  <- c("intrcpt",
"weekssome",
"weekshigh",
"testeraware",
"weekssome:testeraware",
"weekshigh:testeraware")
# the input vector containg the search pattern
provided_vector = c("high", "aware")
# a function which matches both
test_input <- function(in_func, in_vec, drop = "intrcpt") {

if(!is.null(drop)) {
if(is.numeric(drop)) {
in_func <- in_func[-drop]
} else if (is.character(drop)) {
in_func <- in_func[in_func != drop]
}
}

inp <- strsplit(in_func, ":")

pat <- paste(in_vec, collapse = "|")

vapply(inp,
FUN = function(x) all(grepl(pat, x)), 
FUN.VALUE = numeric(1L))
}
# this does not drop the intercept, so this is not the desired result
test_input(statistical_function, provided_vector)
#> [1] 0 1 1 0 1
# these calls drop the "intrcpt" or the first element
test_input(statistical_function, provided_vector, drop = "intrcpt")
#> [1] 0 1 1 0 1
test_input(statistical_function, provided_vector, drop = 1)
#> [1] 0 1 1 0 1
# test: still working
test_input(statistical_function[-1], provided_vector)
#> [1] 0 1 1 0 1

创建于 2021-08-16 由 reprex 软件包 (v2.0.1)

如果截距总是以与intrcpt相同的方式写入,那么我们可以最小化上面的函数并删除drop参数:

test_input <- function(in_func, in_vec) {

inp <- in_func[in_func != "intrcpt"]
inp <- strsplit(inp, ":")

pat <- paste(in_vec, collapse = "|")

vapply(inp,
FUN = function(x) all(grepl(pat, x)), 
FUN.VALUE = numeric(1L))
}

相关内容

最新更新