我有以下数据集:
df<-data.frame(
identifer=c(1,2,3,4),
DF=c("Tablet","Powder","Suspension","System"),
DF_source1=c("Capsule","Powder,Metered","Tablet",NA),
DF_source2=c(NA,NA,"Tablet",NA),
DF_source3=c("Tablet, Extended Release","Liquid","Tablet",NA),
Route_source1=c("Oral","INHALATION","Oral",NA),
Route_source2=c(NA,"TOPICAL","Oral",NA),
Route_source3=c("Oral","IRRIGATION","oral",NA))
我想知道哪些DF_source与DF匹配,以及我应该选择哪条关联的路线。
我希望输出看起来像这样:
df_out<-data.frame(
identifer=c(1,2,3,4),
DF=c("Tablet","Powder","Suspension","System"),
DF_match=c("Tablet, Extended Release","Powder,Metered;Powder",NA,NA),
Route_match=c("Oral","INHALATION;TOPICAL",NA,NA),
DF_match_count=c(1,2,0,0),
DF_route_count=c(1,2,0,0))
我试过这个,但我不确定如何为DF_match和Route_匹配提取值
df%>%mutate_at(vars(matches("(DF_source)")),
list(string_detect = ~str_detect(tolower(DF),tolower(str_replace_all(.,"/|,(\s)?|(?<!,)\s","|")))))
任何帮助将不胜感激,谢谢!
我不完全确定这是你的想法,但希望这可能会有所帮助。
您的最终结果似乎与您的示例数据不匹配(例如,缺少 TOPICAL(。
使用pivot_longer
以更整洁的形式这可能更容易。
编辑:如果列是因子,则转换为filter
中str_detect
的字符。
library(tidyverse)
library(stringr)
df %>%
mutate_if(is.factor, as.character) %>%
pivot_longer(cols = -c(identifer, DF), names_to = c(".value", "number"), names_pattern = "(\w+)(\d+)") %>%
filter(str_detect(DF_source, DF)) %>%
group_by(identifer) %>%
summarise(DF_match = paste(DF_source, collapse = ';'),
Route_match = paste(Route_source, collapse = ';'),
match_count = n()) %>%
right_join(df[,c("identifer", "DF")], by = "identifer") %>%
select(c(identifer, DF, DF_match, Route_match, match_count))
输出
# A tibble: 4 x 5
identifer DF DF_match Route_match match_count
<dbl> <chr> <chr> <chr> <int>
1 1 Tablet Tablet, Extended Release Oral 1
2 2 Powder Powder,Metered;Powder INHALATION;TOPICAL 2
3 3 Suspension NA NA NA
4 4 System NA NA NA