我有两个excel文件,我想使用python甚至excel执行vlookup并查找成本差异。
我的文件看起来像这个
source_data.xlsx包含覆盖的距离及其价格,例如,从1到100的距离应收取4800,从101到120的距离应收费5100。
DISTANCE COST
1-100 4800
101-120 5100
121-140 5500
141-160 5900
161-180 6200
181-200 6600
210-220 6900
221-240 7200
分析.xlsx
loading_station distance_travel total_cost status
PUGU 40 4000 PAID
PUGU 80 3200 PAID
MOROGORO 50 5000 PAID
MOROGORO 220 30400 PAID
DODOMA 150 5100 PAID
KIGOMA 90 2345 PAID
DODOMA 230 6000 PAID
DODOMA 180 16500 PAID
KIGOMA 32 3000 PAID
DODOMA 45 6000 PAID
DODOMA 65 5000 PAID
KIGOMA 77 1000 PAID
KIGOMA 90 4000 PAID
距离的实际成本在source_data.xlsx中给出,我想在Analysis.xlsx检查成本是否与实际值相对应,我想检测少付和多付。
所需的输出应该是这样的,添加了两列,source_cost是通过使用vlookup从source_xlsx中获取的,difference是total_cost和source_cost 之间的差
loading_station distance_travel total_cost status source_cost Difference
PUGU 40 4000 PAID 4800 -800
PUGU 80 3200 PAID 4800 -1600
MOROGORO 50 5000 PAID 4800 200
MOROGORO 220 30400 PAID 6900 23500
DODOMA 150 5100 PAID 5900 -800
KIGOMA 90 2345 PAID 4800 -2455
DODOMA 230 6000 PAID 7200 -1200
DODOMA 180 16500 PAID 6200 10300
KIGOMA 32 3000 PAID 4800 -1800
DODOMA 45 6000 PAID 4800 1200
DODOMA 65 5000 PAID 4800 200
KIGOMA 77 1000 PAID 4800 -3800
KIGOMA 90 4000 PAID 4800 -800
到目前为止我的代码
library(readxl)
analysis <- read_excel("~/analysis.xlsx")
View(analysis)
source_data <- read_excel("~/source_data.xlsx")
View(source_data)
这在一定程度上取决于从Excel读取数据后的格式,但如果它是您的问题所暗示的格式,那么您可以执行:
analysis$source_cost <- source_data$COST[
findInterval(analysis$distance_travel,
sapply(strsplit(source_data$DISTANCE, '-'),
function(x) as.numeric(x[1])))]
analysis$Difference <- analysis$total_cost - analysis$source_cost
analysis
#> loading_station distance_travel total_cost status source_cost Difference
#> 1 PUGU 40 4000 PAID 4800 -800
#> 2 PUGU 80 3200 PAID 4800 -1600
#> 3 MOROGORO 50 5000 PAID 4800 200
#> 4 MOROGORO 220 30400 PAID 6900 23500
#> 5 DODOMA 150 5100 PAID 5900 -800
#> 6 KIGOMA 90 2345 PAID 4800 -2455
#> 7 DODOMA 230 6000 PAID 7200 -1200
#> 8 DODOMA 180 16500 PAID 6200 10300
#> 9 KIGOMA 32 3000 PAID 4800 -1800
#> 10 DODOMA 45 6000 PAID 4800 1200
#> 11 DODOMA 65 5000 PAID 4800 200
#> 12 KIGOMA 77 1000 PAID 4800 -3800
#> 13 KIGOMA 90 4000 PAID 4800 -800
问题数据以可复制格式
source_data <- structure(list(DISTANCE = c("1-100", "101-120", "121-140",
"141-160", "161-180", "181-200", "210-220", "221-240"), COST = c(4800L,
5100L, 5500L, 5900L, 6200L, 6600L, 6900L, 7200L)),
class = "data.frame", row.names = c(NA, -8L))
analysis <- structure(list(loading_station = c("PUGU", "PUGU", "MOROGORO",
"MOROGORO", "DODOMA", "KIGOMA", "DODOMA", "DODOMA", "KIGOMA",
"DODOMA", "DODOMA", "KIGOMA", "KIGOMA"), distance_travel = c(40L,
80L, 50L, 220L, 150L, 90L, 230L, 180L, 32L, 45L, 65L, 77L, 90L
), total_cost = c(4000L, 3200L, 5000L, 30400L, 5100L, 2345L,
6000L, 16500L, 3000L, 6000L, 5000L, 1000L, 4000L), status = c("PAID",
"PAID", "PAID", "PAID", "PAID", "PAID", "PAID", "PAID", "PAID",
"PAID", "PAID", "PAID", "PAID")), class = "data.frame", row.names = c(NA,
-13L))
创建于2022-08-14由reprex包(v2.0.1(