R中基于条件的数据集合并



我有两个数据集要合并到R中,但我似乎无法实现。请参阅下面的数据集1(技术(和数据集2(选项(。数据集1需要将额外数据作为列添加到数据集2(例如选项1成本、选项1排放量等(。

但是,合并不是一对一的,因为不是数据集1中的所有行都需要数据集2中的数据。数据集1中的是/否值表明了这一点。

Dataset1
Technology  Level   Costs   Emissions   Investment  Option 1    Option 2
Technology1 Low 0.9 0.3 0.4 Yes No
Technology1 Medium  0.1 0.8 0.0 Yes No
Technology1 High    0.5 0.6 0.7 Yes No
Technology2 Low 1.0 0.6 0.2 Yes Yes
Technology2 Medium  0.3 0.1 0.2 Yes Yes
Technology3 High    0.7 0.8 0.5 Yes Yes
Dataset2
Option  Detail  Costs   Emissions   Investment
Option1 Low 0.7 0.6 0.7
Option1 Medium  0.8 0.9 0.1
Option1 High    0.7 0.6 0.2
Option2 Low 0.8 0.8 0.7
Option2 Medium  0.2 0.6 0.5
Option2 High    0.1 0.8 0.0
Final result:
Technology  Level   Costs   Emissions   Investment  Option 1 costs  Option 1 Emissions  Option 1 investment Option 2 costs  Option 2 Emissions  Option 2 investment
Technology 1    Low 0.5 0.3 0.2 0.5 0.8 0.1 0.0 0.0 0.0
Technology 1    Medium  0.2 1.0 0.1 0.2 0.7 0.6 0.0 0.0 0.0
Technology 1    High    0.5 0.2 0.1 1.0 0.4 0.3 0.0 0.0 0.0
Technology 2    Low 0.7 1.0 1.0 0.8 0.9 0.5 0.2 0.2 0.6
Technology 2    Medium  0.8 0.8 0.1 0.1 0.4 1.0 0.6 0.9 0.7
Technology 3    High    0.8 1.0 0.5 0.6 1.0 0.1 0.4 0.9 0.5

看来if/else在这里行得通,但我似乎不明白if-else在这儿行得通。

谢谢你的帮助!

代码

Dataset1 %>%
mutate(help1 = ifelse(`Option 1` == "Yes", "Option1", NA_character_),
help2 = ifelse(`Option 2` == "Yes", "Option2", NA_character_)) %>%
left_join(Dataset2, by = c("help1" = "Option", "Level" = "Detail"), 
suffix = c("","_Option1")) %>%
left_join(Dataset2, by = c("help2" = "Option", "Level" = "Detail"), 
suffix = c("","_Option2")) %>%
select(-help1, -help2)

输出

Technology  Level Costs Emissions Investment Option 1 Option 2 Costs_Option1 Emissions_Option1
1 Technology1    Low   0.9       0.3        0.4      Yes       No           0.7               0.6
2 Technology1 Medium   0.1       0.8        0.0      Yes       No           0.8               0.9
3 Technology1   High   0.5       0.6        0.7      Yes       No           0.7               0.6
4 Technology2    Low   1.0       0.6        0.2      Yes      Yes           0.7               0.6
5 Technology2 Medium   0.3       0.1        0.2      Yes      Yes           0.8               0.9
6 Technology3   High   0.7       0.8        0.5      Yes      Yes           0.7               0.6
Investment_Option1 Costs_Option2 Emissions_Option2 Investment_Option2
1                0.7            NA                NA                 NA
2                0.1            NA                NA                 NA
3                0.2            NA                NA                 NA
4                0.7           0.8               0.8                0.7
5                0.1           0.2               0.6                0.5
6                0.2           0.1               0.8                0.0

数据集

Dataset1 <- structure(list(Technology = structure(c(1L, 1L, 1L, 2L, 2L, 3L
), .Label = c("Technology1", "Technology2", "Technology3"), class = "factor"), 
Level = structure(c(2L, 3L, 1L, 2L, 3L, 1L), .Label = c("High", 
"Low", "Medium"), class = "factor"), Costs = c(0.9, 0.1, 
0.5, 1, 0.3, 0.7), Emissions = c(0.3, 0.8, 0.6, 0.6, 0.1, 
0.8), Investment = c(0.4, 0, 0.7, 0.2, 0.2, 0.5), `Option 1` = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = "Yes", class = "factor"), `Option 2` = structure(c(1L, 
1L, 1L, 2L, 2L, 2L), .Label = c("No", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L))
Dataset2 <- structure(list(Option = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Option1", 
"Option2"), class = "factor"), Detail = structure(c(2L, 3L, 1L, 
2L, 3L, 1L), .Label = c("High", "Low", "Medium"), class = "factor"), 
Costs = c(0.7, 0.8, 0.7, 0.8, 0.2, 0.1), Emissions = c(0.6, 
0.9, 0.6, 0.8, 0.6, 0.8), Investment = c(0.7, 0.1, 0.2, 0.7, 
0.5, 0)), class = "data.frame", row.names = c(NA, -6L))

最新更新