让我们考虑矩阵:
example_matrix <- matrix(c("big", "small", "big_something",
"small_really", "small", "big_enough",
"themendous", "big", "small"),ncol = 3, nrow = 3)
> example_matrix
[,1] [,2] [,3]
[1,] "big" "small_really" "themendous"
[2,] "small" "small" "big"
[3,] "big_something" "big_enough" "small"
还有一些向量:
group_vector <- c("group1_big", "group2_small")
这个向量显示矩阵中的哪些单词我应该给group
前缀。我们最终应该得到:
[,1] [,2] [,3]
[1,] "group1_big" "small_really" "themendous"
[2,] "group2_small" "group2_small" "group1_big"
[3,] "big_something" "big_enough" "group2_small"
也就是说,我们用group1_big
替换了example_matrix
中的每个"big"
,并在不接触"big_enough, small_really"
的情况下用"group2_small"
替换了"small"
(只是准确地替换了"big"
和"small"
)。
我的想法
让我们考虑第一种情况,即用"group1_big"
替换每个"big"
。我的想法是检查example_matrix
哪些元素以"big"
结尾,并为每个元素添加前缀"group_1"
> apply(example_matrix, 2, function(x) endsWith(x, "big"))
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE FALSE
我的想法是这样的:
apply(example_matrix, 2, function(x) if endsWith(x, "big") paste0(group_vector[1], x) else x)
所以要放条件 - 如果特定元素真的以"大"结尾,那么我们添加前缀,如果不是 - 我们离开它。
但是,此代码会产生错误:
Error: unexpected symbol in "apply(example_matrix, 2, function(x) if endsWith"
你知道我做错了什么,这个问题的解决方案是什么?
以下是使用stringr
中的str_replace_all
的一种方法:
example_matrix[] <- stringr::str_replace_all(example_matrix,
setNames(group_vector, sprintf('\b%s\b',
sub('group\d+_', '', group_vector))))
example_matrix
# [,1] [,2] [,3]
#[1,] "group1_big" "small_really" "themendous"
#[2,] "group2_small" "group2_small" "group1_big"
#[3,] "big_something" "big_enough" "group2_small"
要理解这一点,请将其分解为更小的步骤 -
sub
从group_vector
中删除'group'
+数字。
sub('group\d+_', '', group_vector)
#[1] "big" "small"
我们为此添加一个词边界,以便它只匹配与('big'
)不匹配的模式('big_something'
)。
sprintf('\b%s\b', sub('group\d+_', '', group_vector))
#[1] "\bbig\b" "\bsmall\b"
现在创建一个可用于str_replace_all
的命名向量:
setNames(group_vector, sprintf('\b%s\b', sub('group\d+_', '', group_vector)))
# \bbig\b \bsmall\b
# "group1_big" "group2_small"