r-如何为合并设置应被视为相等的值

  • 本文关键字:合并 设置 r merge data.table
  • 更新时间 :
  • 英文 :


我有如下示例数据:

library(data.table)
DT1 <- structure(list(Value_A = c(0.999891903413269, 0.982154309144703, 
0.592188885660591, 0.661057845195792, 0.937776766715946, 0.916698007208248, 
0.0701487169744913, 0.464692536229384, 0.893825143445546, 0.446259881776091
), Cat_A = c("1-5", "1-5", "1-5", "1-5", "1-5", "6-10", "6-10", 
"6-10", "6-10", "6-10")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))
# A tibble: 10 x 2
Value_A Cat_A
<dbl> <chr>
1  1.00   1-5  
2  0.982  1-5  
3  0.592  1-5  
4  0.661  1-5  
5  0.938  1-5  
6  0.917  6-10 
7  0.0701 6-10 
8  0.465  6-10 
9  0.894  6-10 
10  0.446  6-10 
DT2 <- structure(list(Value_B = c(0.77108614914467, 0.707807008408563, 
0.142130428652957, 0.0790962463426558, 0.0900305847177399, 0.342793229042372, 
0.533475451123892, 0.392646249361294, 0.969809092535721, 0.773003569195185
), Cat_B = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 10 x 2
Value_B Cat_B
<dbl> <dbl>
1  0.771      1
2  0.708      2
3  0.142      3
4  0.0791     4
5  0.0900     5
6  0.343      6
7  0.533      7
8  0.393      8
9  0.970      9
10  0.773     10

我想通过Cat_ACat_Bmerge这些数据帧,以获得:

DT_result <- structure(list(Value_A = c(0.999891903413269, 0.982154309144703, 
0.592188885660591, 0.661057845195792, 0.937776766715946, 0.916698007208248, 
0.0701487169744913, 0.464692536229384, 0.893825143445546, 0.446259881776091
), Cat_A = c("1-5", "1-5", "1-5", "1-5", "1-5", "6-10", "6-10", 
"6-10", "6-10", "6-10"), Value_B = c(0.77108614914467, 0.707807008408563, 
0.142130428652957, 0.0790962463426558, 0.0900305847177399, 0.342793229042372, 
0.533475451123892, 0.392646249361294, 0.969809092535721, 0.773003569195185
), Cat_B = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 10 x 4
Value_A Cat_A Value_B Cat_B
<dbl> <chr>   <dbl> <dbl>
1  1.00   1-5    0.771      1
2  0.982  1-5    0.708      2
3  0.592  1-5    0.142      3
4  0.661  1-5    0.0791     4
5  0.938  1-5    0.0900     5
6  0.917  6-10   0.343      6
7  0.0701 6-10   0.533      7
8  0.465  6-10   0.393      8
9  0.894  6-10   0.970      9
10  0.446  6-10   0.773     10

我知道我可以使用by.x="Cat_A"by.y="Cat_B":

merge(DT1, DT2, by.x="Cat_A", by.y="Cat_B", all.x=TRUE)

但是我如何告诉merge1,2,3,4 and 5应该与1-5合并等等?

唯一的解决方案是重新编码吗?

您可以在DT2 中创建一个新变量

DT2$new_var <- ""
DT2$new_var[DT2$Cat_B %in% 1:5] <- "1-5"
DT2$new_var[DT2$Cat_B %in% 6:10] <- "6-10"

那么你的输出将是:

A tibble: 10 x 3
Value_B Cat_B new_var
<dbl> <dbl> <chr>  
1  0.771      1 1-5    
2  0.708      2 1-5    
3  0.142      3 1-5    
4  0.0791     4 1-5    
5  0.0900     5 1-5    
6  0.343      6 6-10   
7  0.533      7 6-10   
8  0.393      8 6-10   
9  0.970      9 6-10   
10  0.773     10 6-10 

现在您可以使用通用变量合并这两个表

希望它有用。。!

最新更新