r语言 - 将一列中的精确单词匹配到另一列中的字符串,并从其他列的字符串中删除匹配的单词



这是我的示例数据库。

column1           Column2
STELLARN714WPUR   STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
STELLARN714WRED   STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
STELLARN814WRED   STELLARN594WRED,STELLARN714WRED,STELLARN814WRED

我必须将值从column1匹配到column2 &如果在column2字符串中找到了精确匹配,则需要从字符串中删除匹配值。

例如期望输出:

column1           Column2
STELLARN714WPUR   STELLARN594WPUR,STELLARN814WPUR
STELLARN714WRED   STELLARN594WRED,STELLARN814WRED,
STELLARN814WRED   STELLARN594WRED,STELLARN714WRED,

我试了stringer和gsub,但他们没有帮助。任何帮助将是非常感激的。谢谢你。

您可以使用str_remove:

library(stringr)
str_remove(df$Column2, df$column1)
#[1] "STELLARN594WPUR,,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED,"
#[3] "STELLARN594WRED,STELLARN714WRED,"

为了从额外的逗号中清除字符串,我们可以使用gsubtrimws

gsub(',{1,}', ',', trimws(str_remove(df$Column2, df$column1), whitespace = ','))
#[1] "STELLARN594WPUR,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED"
#[3] "STELLARN594WRED,STELLARN714WRED"

base R中的另一个选项是用逗号分隔字符串并选择column1中不存在的字符串。

df$Column2 <- mapply(function(x, y) toString(setdiff(x, y)), 
strsplit(df$Column2, ','), df$column1)
df
#          column1                          Column2
#1 STELLARN714WPUR STELLARN594WPUR, STELLARN814WPUR
#2 STELLARN714WRED STELLARN594WRED, STELLARN814WRED
#3 STELLARN814WRED STELLARN594WRED, STELLARN714WRED 

df <- structure(list(column1 = c("STELLARN714WPUR", "STELLARN714WRED", 
"STELLARN814WRED"), Column2 = c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR", 
"STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED")), 
class = "data.frame", row.names = c(NA, -3L))

您可以使用str_extractgsub。我还在第4行中添加了一个场景,其中第1列的值与第2列不匹配。您可以在第3列中找到最终输出。

library(stringr)
library(dplyr)
col1 <- c("STELLARN714WPUR", "STELLARN714WRED", "STELLARN814WRED", "AB")
col2 <- c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR", "STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED", "STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR")
df <- data.frame(column1  = col1, Column2 = col2, stringsAsFactors = FALSE)
df
column1                                         Column2
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4              AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
df %>%
mutate(match_val = str_extract(Column2, column1),
Column3 = ifelse(is.na(match_val), Column2, 
str_replace(Column2, paste0(match_val,",|, ",match_val), "")))
column1                                         Column2       match_val
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR STELLARN714WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED STELLARN814WRED
4              AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR            <NA>
Column3
1                 STELLARN594WPUR,STELLARN814WPUR
2 STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4 STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR

相关内容

最新更新