r-通过正则表达式将一个字符向量与另一个字符矢量进行比较



我有两个字符向量,我只想比较它们,并将包含相同字符模式的向量保留在country中。

a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv", "nutr_sup_AUS.csv")
b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_AGO_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv") 
#wished result:
result_a<-c("nutr_sup_AFG.csv", "nutr_sup_ARE.csv", "nutr_sup_ARG.csv")
result_b<-c("nutr_needs_AFG_pop.csv", "nutr_needs_ARE_pop.csv", "nutr_needs_ARG_pop.csv") 

我想先进行子集设置,然后比较字符串:

a_ISO<-str_sub(a, start=10, end = -5) #subset just ISO name
b_ISO<-str_sub(b, start =12, end = -9 ) #subset just ISO name
dif1<-setdiff(a, b) # get difference (order is important)
dif2<-setdiff(b,a) # get difference
dif<-c(dif1,dif2) # selection which to remove

但我不知道如何将a和b与dif进行比较。所以基本上如何通过正则表达式将一个字符向量与另一个字符矢量进行比较。

我认为应该使用正则表达式而不是位置来提取字符。我认为,只对intersect()中要保留的元素进行子集设置也更容易,而不是确定settdiff():中要删除的元素

使用正则表达式提取三个字符的代码:

index_a<-stringr::string_extract(a, "[A-Z]{3}")
index_b<-stringr::string_extract(b, "[A-Z]{3}")

然后用intersect()和基索引对向量进行子集:

intersect_ab<-intersect(index_a, index_b)
result_a<-a[index_a %in% intersect_ab]
result_b<-b[index_b %in% intersect_ab]

也就是说,您的解决方案需要额外的最后一步:

result_a<-a[!dif1 %in% a_ISO]
result_b<-b[!dif2 %in% b_ISO]

相关内容

最新更新