我想比较字符,并在数据框架的2列之间返回不匹配。
如果x2x,y67y,它不应返回,因为x保留为x,y仍为y。
输入:
x y x_val y_val
A B x2x, y67h, d7j x2y, y67y, d7r
B C x2y, y67y, d7r x2y, y67y, d7r
C A x2y, y67y, d7r x2x, y67h, d7j
C D x2y, y67y, d7r x67b, g72v, b8c
D E x67b, g72v, b8c x67r, g72j
我想添加一个列val并返回x_val和y_val之间的差异
输出:
x y x_val y_val val
A B x2x, y67h, d7j x2y, y67y, d7r x2y, d7r
B C x2y, y67y, d7r x2y, y67y, d7r NA
C A x2y, y67y, d7r x2x, y67h, d7j y67h, d7j
C D x2y, y67y, d7r y67b, g72v, b8c y67b, g72v, b8c
D E y67b, g72v, b8c y67b, g72j g72j
我尝试了xy_val <- y_val[!(y_val %in% x_val)]
您能建议您有关如何输出不匹配的解决方案。
我的数据:
structure(list(x = c("A", "B", "C", "C", "D"), y = c("B", "C", "A", "D", "E"), x_val = c("x2x, y67h, d7j", "x2y, y67y, d7r", "x2y, y67y, d7r", "x2y, y67y, d7r", "y67b, g72v, b8c"), y_val = c("x2y, y67y, d7r", "x2y, y67y, d7r", "x2x, y67h, d7j", "y67b, g72v, b8c", "y67b, g72j" )), class = "data.frame", row.names = c(NA, -5L))
感谢您的帮助!
谢谢
带有 dplyr
和 purrr
:
library(dplyr)
library(purrr)
f %>% mutate(diff_x = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\d+)(?!\1)', setdiff(.x, .y),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA),
diff_y = map2_chr(strsplit(x_val, split = ", "),
strsplit(y_val, split = ", "),
~paste(grep('([a-z])(?>\d+)(?!\1)', setdiff(.y, .x),
value = TRUE, perl = TRUE),
collapse = ", ")) %>%
replace(. == "", NA))
注意:
grep
接收setdiff
的输出,并以"与数字相同的字符之间的格式删除任何元素"([a-z])
匹配任何alpha字符。(?>\d+)
是一个原子组,与任何长度但不回溯的数字匹配。(?!\1)
是一个负面的lookahead,与([a-z])
匹配的任何匹配
输出:
x y x_val y_val diff_x diff_y
1 A B x2x, y67h, d7j x2y, y67y, d7r y67h, d7j x2y, d7r
2 B C x2y, y67y, d7r x2y, y67y, d7r <NA> <NA>
3 C A x2y, y67y, d7r x2x, y67h, d7j x2y, d7r y67h, d7j
4 C D x2y, y67y, d7r y67b, g72v, b8c x2y, d7r y67b, g72v, b8c
5 D E y67b, g72v, b8c y67b, g72j g72v, b8c g72j
这会提供所需的结果吗?
check_this = function(temp_data)
{
print(temp_data)
string_1 = gsub(", ", " ", temp_data["x_val"])
string_2 = gsub(", ", " ", temp_data["y_val"])
string_sub_1 = gsub(" ", "|", string_1)
string_sub_2 = gsub(" ", "|", string_2)
unmatche_s1 = gsub(string_sub_2, "", string_1)
unmatche_s2 = gsub(string_sub_1, "", string_2)
# return both as a list - if you need only unmachtedy_in_x you can just return(unmatched_s2)
return(list(unmatchedx_in_y = unmatche_s1, unmatchedy_in_x = unmatche_s2))
}
res = apply(f, 1, check_this)