我有如下示例数据:
library(data.table)
DT1 <- structure(list(Value_A = c(0.999891903413269, 0.982154309144703,
0.592188885660591, 0.661057845195792, 0.937776766715946, 0.916698007208248,
0.0701487169744913, 0.464692536229384, 0.893825143445546, 0.446259881776091
), Cat_A = c("1-5", "1-5", "1-5", "1-5", "1-5", "6-10", "6-10",
"6-10", "6-10", "6-10")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
# A tibble: 10 x 2
Value_A Cat_A
<dbl> <chr>
1 1.00 1-5
2 0.982 1-5
3 0.592 1-5
4 0.661 1-5
5 0.938 1-5
6 0.917 6-10
7 0.0701 6-10
8 0.465 6-10
9 0.894 6-10
10 0.446 6-10
DT2 <- structure(list(Value_B = c(0.77108614914467, 0.707807008408563,
0.142130428652957, 0.0790962463426558, 0.0900305847177399, 0.342793229042372,
0.533475451123892, 0.392646249361294, 0.969809092535721, 0.773003569195185
), Cat_B = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 10 x 2
Value_B Cat_B
<dbl> <dbl>
1 0.771 1
2 0.708 2
3 0.142 3
4 0.0791 4
5 0.0900 5
6 0.343 6
7 0.533 7
8 0.393 8
9 0.970 9
10 0.773 10
我想通过Cat_A
和Cat_B
来merge
这些数据帧,以获得:
DT_result <- structure(list(Value_A = c(0.999891903413269, 0.982154309144703,
0.592188885660591, 0.661057845195792, 0.937776766715946, 0.916698007208248,
0.0701487169744913, 0.464692536229384, 0.893825143445546, 0.446259881776091
), Cat_A = c("1-5", "1-5", "1-5", "1-5", "1-5", "6-10", "6-10",
"6-10", "6-10", "6-10"), Value_B = c(0.77108614914467, 0.707807008408563,
0.142130428652957, 0.0790962463426558, 0.0900305847177399, 0.342793229042372,
0.533475451123892, 0.392646249361294, 0.969809092535721, 0.773003569195185
), Cat_B = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 10 x 4
Value_A Cat_A Value_B Cat_B
<dbl> <chr> <dbl> <dbl>
1 1.00 1-5 0.771 1
2 0.982 1-5 0.708 2
3 0.592 1-5 0.142 3
4 0.661 1-5 0.0791 4
5 0.938 1-5 0.0900 5
6 0.917 6-10 0.343 6
7 0.0701 6-10 0.533 7
8 0.465 6-10 0.393 8
9 0.894 6-10 0.970 9
10 0.446 6-10 0.773 10
我知道我可以使用by.x="Cat_A"
和by.y="Cat_B"
:
merge(DT1, DT2, by.x="Cat_A", by.y="Cat_B", all.x=TRUE)
但是我如何告诉merge
1,2,3,4 and 5
应该与1-5
合并等等?
唯一的解决方案是重新编码吗?
您可以在DT2 中创建一个新变量
DT2$new_var <- ""
DT2$new_var[DT2$Cat_B %in% 1:5] <- "1-5"
DT2$new_var[DT2$Cat_B %in% 6:10] <- "6-10"
那么你的输出将是:
A tibble: 10 x 3
Value_B Cat_B new_var
<dbl> <dbl> <chr>
1 0.771 1 1-5
2 0.708 2 1-5
3 0.142 3 1-5
4 0.0791 4 1-5
5 0.0900 5 1-5
6 0.343 6 6-10
7 0.533 7 6-10
8 0.393 8 6-10
9 0.970 9 6-10
10 0.773 10 6-10
现在您可以使用通用变量合并这两个表
希望它有用。。!