将数据帧与包含在单独数据帧中的字符串中的键值合并


employee <- c('John','Peter', 'Gynn', 'Jolie', 'Hope', 'Sue', 'Jane', 'Sarah')
salary <- c('VT020', 'VT126', 'VT027', 'VT667', 'VC120', 'VT000', 'VA120', 'VA020')
emp <- data.frame(employee, salary)

benefit <- c('Health', 'Time', 'Bonus')
benefit_id <- c('VT020 VT126 VT667 VA020', 'VT667', 'VT126 VT667 VT000')
ben <- data.frame(benefit, benefit_id)

上面我们有数据帧,一个包含名称和唯一ID,另一个包含类别和唯一ID列表。

ben数据帧与emp数据帧合并以使我们获得分配给每位员工的适当利益的最有效方法是什么?

tidyverse

library(dplyr)
library(tidyr) # tidyr
ben %>%
mutate(benefit_id = strsplit(benefit_id, "\s+")) %>%
unnest(benefit_id) %>%
left_join(emp, ., by = c(salary = "benefit_id"))
#    employee salary benefit
# 1      John  VT020  Health
# 2     Peter  VT126  Health
# 3     Peter  VT126   Bonus
# 4      Gynn  VT027    <NA>
# 5     Jolie  VT667  Health
# 6     Jolie  VT667    Time
# 7     Jolie  VT667   Bonus
# 8      Hope  VC120    <NA>
# 9       Sue  VT000   Bonus
# 10     Jane  VA120    <NA>
# 11    Sarah  VA020  Health

根据您的需要,您也可以选择不同的加入方式。例如,如果需要所有配对,请使用full_join,其中employee中的NA表示没有员工的福利。

仅供参考:如果你在4.0之前运行R,那么你的数据中可能有factor。要解决此问题,只需先使用as.character转换因子列即可。(这可以用sapply(ben, inherits, "factor")来确定。(

数据表

library(data.table)
setDT(emp)
ben_long <- setDT(ben)[, list(benefit_id = unlist(strsplit(x = benefit_id, split = " "))), by = benefit]
merge(x = emp, y = ben_long, by.x = "salary", by.y = "benefit_id", all.x = TRUE)

salary employee benefit
1:  VA020    Sarah  Health
2:  VA120     Jane    <NA>
3:  VC120     Hope    <NA>
4:  VT000      Sue   Bonus
5:  VT020     John  Health
6:  VT027     Gynn    <NA>
7:  VT126    Peter  Health
8:  VT126    Peter   Bonus
9:  VT667    Jolie  Health
10:  VT667    Jolie    Time
11:  VT667    Jolie   Bonus

最新更新