如何使用R对范围数据进行Vlookup



我有两个excel文件,我想使用python甚至excel执行vlookup并查找成本差异。

我的文件看起来像这个

source_data.xlsx包含覆盖的距离及其价格,例如,从1到100的距离应收取4800,从101到120的距离应收费5100。

DISTANCE     COST    
1-100        4800    
101-120      5100   
121-140      5500   
141-160      5900 
161-180      6200    
181-200      6600    
210-220      6900    
221-240      7200

分析.xlsx

loading_station  distance_travel     total_cost    status    
PUGU                  40                4000       PAID       
PUGU                  80                3200       PAID  
MOROGORO              50                5000       PAID
MOROGORO              220               30400      PAID
DODOMA                150               5100       PAID
KIGOMA                90                2345       PAID
DODOMA                230               6000       PAID
DODOMA                180               16500      PAID
KIGOMA                32                3000       PAID
DODOMA                45                6000       PAID
DODOMA                65                5000       PAID
KIGOMA                77                1000       PAID
KIGOMA                90                4000       PAID

距离的实际成本在source_data.xlsx中给出,我想在Analysis.xlsx检查成本是否与实际值相对应,我想检测少付和多付。

所需的输出应该是这样的,添加了两列,source_cost是通过使用vlookup从source_xlsx中获取的,difference是total_cost和source_cost 之间的差

loading_station distance_travel total_cost  status  source_cost  Difference
PUGU               40                4000     PAID     4800        -800
PUGU               80                3200     PAID     4800        -1600
MOROGORO           50                5000     PAID     4800        200
MOROGORO           220               30400    PAID     6900        23500
DODOMA             150               5100     PAID     5900        -800
KIGOMA             90                2345     PAID     4800        -2455
DODOMA             230               6000     PAID     7200        -1200
DODOMA             180               16500    PAID     6200        10300
KIGOMA             32                3000     PAID     4800        -1800
DODOMA             45                6000     PAID     4800        1200
DODOMA             65                5000     PAID     4800        200
KIGOMA             77                1000     PAID     4800        -3800
KIGOMA             90                4000     PAID     4800        -800

到目前为止我的代码

library(readxl)
analysis <- read_excel("~/analysis.xlsx")
View(analysis)
source_data <- read_excel("~/source_data.xlsx")
View(source_data)

这在一定程度上取决于从Excel读取数据后的格式,但如果它是您的问题所暗示的格式,那么您可以执行:

analysis$source_cost <- source_data$COST[
findInterval(analysis$distance_travel,
sapply(strsplit(source_data$DISTANCE, '-'), 
function(x) as.numeric(x[1])))]
analysis$Difference <- analysis$total_cost - analysis$source_cost
analysis
#>    loading_station distance_travel total_cost status source_cost Difference
#> 1             PUGU              40       4000   PAID        4800       -800
#> 2             PUGU              80       3200   PAID        4800      -1600
#> 3         MOROGORO              50       5000   PAID        4800        200
#> 4         MOROGORO             220      30400   PAID        6900      23500
#> 5           DODOMA             150       5100   PAID        5900       -800
#> 6           KIGOMA              90       2345   PAID        4800      -2455
#> 7           DODOMA             230       6000   PAID        7200      -1200
#> 8           DODOMA             180      16500   PAID        6200      10300
#> 9           KIGOMA              32       3000   PAID        4800      -1800
#> 10          DODOMA              45       6000   PAID        4800       1200
#> 11          DODOMA              65       5000   PAID        4800        200
#> 12          KIGOMA              77       1000   PAID        4800      -3800
#> 13          KIGOMA              90       4000   PAID        4800       -800

问题数据以可复制格式

source_data <- structure(list(DISTANCE = c("1-100", "101-120", "121-140", 
"141-160", "161-180", "181-200", "210-220", "221-240"), COST = c(4800L, 
5100L, 5500L, 5900L, 6200L, 6600L, 6900L, 7200L)), 
class = "data.frame", row.names = c(NA, -8L))
analysis <- structure(list(loading_station = c("PUGU", "PUGU", "MOROGORO", 
"MOROGORO", "DODOMA", "KIGOMA", "DODOMA", "DODOMA", "KIGOMA", 
"DODOMA", "DODOMA", "KIGOMA", "KIGOMA"), distance_travel = c(40L, 
80L, 50L, 220L, 150L, 90L, 230L, 180L, 32L, 45L, 65L, 77L, 90L
), total_cost = c(4000L, 3200L, 5000L, 30400L, 5100L, 2345L, 
6000L, 16500L, 3000L, 6000L, 5000L, 1000L, 4000L), status = c("PAID", 
"PAID", "PAID", "PAID", "PAID", "PAID", "PAID", "PAID", "PAID", 
"PAID", "PAID", "PAID", "PAID")), class = "data.frame", row.names = c(NA, 
-13L))

创建于2022-08-14由reprex包(v2.0.1(

最新更新