问题:
如何有效地处理对HERE API的多个位置请求?
一般来说,我是GET请求和REST的新手,但我需要获取位置数据,我正在试用HERE API。我在R中这样做,但这个事实与我的问题无关。
这样可以:
library(httr)
library(jsonlite)
HERE_API_KEY <- #REMOVED#
url <- "https://geocode.search.hereapi.com/v1/"
zip <- 18615
country <- "United+States"
theRequest <- paste0(url,"geocode?qq=postalCode=",zip,";country=",country,"&apiKey=",HERE_API_KEY)
theResponse <- GET(theRequest)
我收到一条状态200消息和数据内容——没有问题。
我想要什么:
上面的例子只是一个位置,但我有一个需要查找的数千个位置的列表,最终试图确定位置数据集中两点之间的路由距离。
如上所述,我可以创建一个循环,并一次为每个位置提交一个请求,但由于我有很多位置,我想知道是否有一种首选方法可以在一个调用中提交位置列表(或将其分组?(,这对HERE API更好,可以有效地获取数据。在黑暗中,我尝试了3个位置的测试:
theRequest <- "https://geocode.search.hereapi.com/v1/geocode?qq=postalCode=18615;country=United+States&qq=postalCode=L4T1G3;country=Canada&qq=postalCode=62521;country=United+States&apiKey=#REMOVED#"
但它没有起作用。也许这是不可能的,我只是不理解REST,但我希望尽可能有效地处理多个请求——既为我自己,也为HERE API服务。提前谢谢。
如果您想使用HERE Geocoding and Search API,则在数据中循环发送每个地址的单独GET请求是一种完全有效的方法,只需确保您不会超过您所拥有计划的每秒最大允许请求数(RPS((例如,Freemium计划中的Geocoding和Search API为5个RPS(;否则您的查询将导致错误代码429";请求太多";你必须再次发送它们。
或者,您可以使用HERE Batch Geocoder API,该API旨在通过单个API调用处理大型数据集(最多100万条记录用于地理编码或反向地理编码(。使用此服务包括3个步骤:
- 发送一个POST请求,其中包含要进行地理编码或反向地理编码的所有数据
- 调用状态终结点以监视您提交的作业的状态。您可能希望定期进行这种类型的调用,直到响应指示您的作业已完成,并且您的输出已准备好下载
- 下载您的结果
下面是一个如何使用此服务的示例;请注意,此API支持POST请求,而不是GET。
用示例回答
astro.comma的答案指出了我需要去哪里才能获得批量的HERE API——严格来说,这就是答案以及为什么它被标记为这样。对于那些稍后经过这里的人来说,这是我的测试脚本,我用它来了解如何在R中实现请求,这是基于我从astro.comma获得的帮助
示例数据:
控制台:
df_locations[1:5,] # Show a sample of the data in the data frame
># A tibble: 5 x 3
> recID country postalCode
> <int> <fct> <chr>
>1 1 CAN L4T1G3
>2 2 USA 62521
>3 3 CAN H9P1K2
>4 4 CAN L6S4K6
>5 5 USA 52632
dput(df_locations[1:5,]) # For ease of reproducibility, here's dput():
structure(list(recID = 1:5, country = structure(c(1L, 2L, 1L,
1L, 2L), .Label = c("CAN", "USA", "MEX"), class = "factor"),
postalCode = c("L4T1G3", "62521", "H9P1K2", "L6S4K6", "52632"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
脚本:
library(httr)
library(tidyverse)
HERE_API_KEY <- "YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE"
url <- "https://batch.geocoder.ls.hereapi.com/6.2/jobs"
# Write df_locations to pipe-delimited text file to prep for POST
write.table(
df_locations,
file = "locations.txt",
quote = FALSE,
sep = "|",
row.names = FALSE
)
# Assemble the POST request url to start the job
theRequest <-
paste0(
url,
"?&apiKey=",
HERE_API_KEY,
"&action=run&header=true",
"&indelim=%7C&outdelim=%7C",
"&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude",
"%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry",
"&outputCombined=true"
)
# Now submit the POST request along with the location file
theResponse <-
POST(url = theRequest, body = upload_file("locations.txt"))
控制台:
>theResponse
Response [https://batch.geocoder.ls.hereapi.com/6.2/jobs?&apiKey=YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE&action=run&header=true&indelim=%7C&outdelim=%7C&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry&outputCombined=true]
Date: 2021-12-27 00:45
Status: 200
Content-Type: application/json;charset=utf-8
Size: 209 B
脚本:
# Extract the Request ID so we can check for completion status of the job, and
# use it to identify / download the zip file when complete.
reqID <- content(theResponse)$Response$MetaInfo$RequestId
控制台:
>reqID
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595" # or looks like this -- I changed it here.
脚本:
# After letting some time pass (about a minute for my test file), I check
# status of the job with a GET request:
JOB_status <-
GET(paste0(url, "/", reqID, "?action=status&apiKey=", HERE_API_KEY))
控制台:
>content(JOB_status)
$Response
$Response$MetaInfo
$Response$MetaInfo$RequestId
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595"
$Response$Status
[1] "completed" # There are other statuses (statii?), but this one we care about.
$Response$JobStarted
[1] "2021-12-27T00:46:36.000+0000"
$Response$JobFinished
[1] "2021-12-27T00:46:49.000+0000"
$Response$TotalCount
[1] 2080 # Ignore this -- I only provided you with first 5 rows
$Response$ValidCount
[1] 2080
$Response$InvalidCount
[1] 0
$Response$ProcessedCount
[1] 2080
$Response$PendingCount
[1] 0
$Response$SuccessCount
[1] 2076
$Response$ErrorCount
[1] 4
脚本:
# I stayed with GET request via httr, but no reason you can't switch to some other
# method for download like cURL
COMPLETED_JOB <-
GET(paste0(url, "/", reqID, "/result?apiKey=", HERE_API_KEY))
job_content <- content(x = COMPLETED_JOB, as = "raw") # This extract hexidecimal data which is the zipped content -- has to get extracted to be useful.
writeBin(job_content, con = "Processed_locations.zip") # Writes the binary data to file.
unzip(zipfile = "Processed_locations.zip") # Extracts the zip file as its own text file.
最终结果文件:
recId|SeqNumber|seqLength|recId|seqNumber|seqLength|displayLatitude|displayLongitude|city|postalCode|country
1|1|1|1|1|1|43.70924|-79.658|Mississauga|L4T 1G3|CAN
2|1|1|2|1|1|39.83972|-88.92881|Decatur|62521|USA
3|1|1|3|1|1|45.47659|-73.78061|Dorval|H9P 1K2|CAN
4|1|1|4|1|1|43.75666|-79.71021|Brampton|L6S 4K6|CAN
5|1|1|5|1|1|40.4013|-91.3848|Keokuk|52632|USA