employee <- c('John','Peter', 'Gynn', 'Jolie', 'Hope', 'Sue', 'Jane', 'Sarah')
salary <- c('VT020', 'VT126', 'VT027', 'VT667', 'VC120', 'VT000', 'VA120', 'VA020')
emp <- data.frame(employee, salary)
benefit <- c('Health', 'Time', 'Bonus')
benefit_id <- c('VT020 VT126 VT667 VA020', 'VT667', 'VT126 VT667 VT000')
ben <- data.frame(benefit, benefit_id)
上面我们有数据帧,一个包含名称和唯一ID,另一个包含类别和唯一ID列表。
将ben
数据帧与emp
数据帧合并以使我们获得分配给每位员工的适当利益的最有效方法是什么?
tidyverse
library(dplyr)
library(tidyr) # tidyr
ben %>%
mutate(benefit_id = strsplit(benefit_id, "\s+")) %>%
unnest(benefit_id) %>%
left_join(emp, ., by = c(salary = "benefit_id"))
# employee salary benefit
# 1 John VT020 Health
# 2 Peter VT126 Health
# 3 Peter VT126 Bonus
# 4 Gynn VT027 <NA>
# 5 Jolie VT667 Health
# 6 Jolie VT667 Time
# 7 Jolie VT667 Bonus
# 8 Hope VC120 <NA>
# 9 Sue VT000 Bonus
# 10 Jane VA120 <NA>
# 11 Sarah VA020 Health
根据您的需要,您也可以选择不同的加入方式。例如,如果需要所有配对,请使用full_join
,其中employee
中的NA
表示没有员工的福利。
仅供参考:如果你在4.0之前运行R,那么你的数据中可能有factor
。要解决此问题,只需先使用as.character
转换因子列即可。(这可以用sapply(ben, inherits, "factor")
来确定。(
数据表
library(data.table)
setDT(emp)
ben_long <- setDT(ben)[, list(benefit_id = unlist(strsplit(x = benefit_id, split = " "))), by = benefit]
merge(x = emp, y = ben_long, by.x = "salary", by.y = "benefit_id", all.x = TRUE)
salary employee benefit
1: VA020 Sarah Health
2: VA120 Jane <NA>
3: VC120 Hope <NA>
4: VT000 Sue Bonus
5: VT020 John Health
6: VT027 Gynn <NA>
7: VT126 Peter Health
8: VT126 Peter Bonus
9: VT667 Jolie Health
10: VT667 Jolie Time
11: VT667 Jolie Bonus