r语言 - 如何向向量给出的矩阵添加前缀?



让我们考虑矩阵:

example_matrix <- matrix(c("big", "small", "big_something",
"small_really", "small", "big_enough", 
"themendous", "big", "small"),ncol = 3, nrow = 3) 
> example_matrix
[,1]            [,2]           [,3]        
[1,] "big"           "small_really" "themendous"
[2,] "small"         "small"        "big"       
[3,] "big_something" "big_enough"   "small" 

还有一些向量:

group_vector <- c("group1_big", "group2_small")

这个向量显示矩阵中的哪些单词我应该给group前缀。我们最终应该得到:

[,1]            [,2]           [,3]          
[1,] "group1_big"    "small_really" "themendous"  
[2,] "group2_small"  "group2_small" "group1_big"  
[3,] "big_something" "big_enough"   "group2_small"

也就是说,我们用group1_big替换了example_matrix中的每个"big",并在不接触"big_enough, small_really"的情况下用"group2_small"替换了"small"(只是准确地替换了"big""small")。

我的想法

让我们考虑第一种情况,即用"group1_big"替换每个"big"。我的想法是检查example_matrix哪些元素以"big"结尾,并为每个元素添加前缀"group_1"

> apply(example_matrix, 2, function(x) endsWith(x, "big"))
[,1]  [,2]  [,3]
[1,]  TRUE FALSE FALSE
[2,] FALSE FALSE  TRUE
[3,] FALSE FALSE FALSE

我的想法是这样的:

apply(example_matrix, 2, function(x) if endsWith(x, "big") paste0(group_vector[1], x) else x)

所以要放条件 - 如果特定元素真的以"大"结尾,那么我们添加前缀,如果不是 - 我们离开它。

但是,此代码会产生错误:

Error: unexpected symbol in "apply(example_matrix, 2, function(x) if endsWith"

你知道我做错了什么,这个问题的解决方案是什么?

以下是使用stringr中的str_replace_all的一种方法:

example_matrix[] <- stringr::str_replace_all(example_matrix, 
setNames(group_vector, sprintf('\b%s\b', 
sub('group\d+_', '', group_vector))))
example_matrix
#       [,1]            [,2]           [,3]          
#[1,] "group1_big"    "small_really" "themendous"  
#[2,] "group2_small"  "group2_small" "group1_big"  
#[3,] "big_something" "big_enough"   "group2_small"

要理解这一点,请将其分解为更小的步骤 -

subgroup_vector中删除'group'+数字。

sub('group\d+_', '', group_vector)
#[1] "big"   "small"

我们为此添加一个词边界,以便它只匹配与('big')不匹配的模式('big_something')。

sprintf('\b%s\b', sub('group\d+_', '', group_vector))
#[1] "\bbig\b"   "\bsmall\b"

现在创建一个可用于str_replace_all的命名向量:

setNames(group_vector, sprintf('\b%s\b', sub('group\d+_', '', group_vector)))
#     \bbig\b    \bsmall\b 
#  "group1_big" "group2_small" 

最新更新