通过HERE API REST查询查询多个值的最有效方法是什么



问题:

如何有效地处理对HERE API的多个位置请求?

一般来说,我是GET请求和REST的新手,但我需要获取位置数据,我正在试用HERE API。我在R中这样做,但这个事实与我的问题无关。

这样可以:

library(httr)
library(jsonlite)
HERE_API_KEY <- #REMOVED#
url <- "https://geocode.search.hereapi.com/v1/"
zip <- 18615
country <- "United+States"
theRequest <- paste0(url,"geocode?qq=postalCode=",zip,";country=",country,"&apiKey=",HERE_API_KEY)
theResponse <- GET(theRequest)

我收到一条状态200消息和数据内容——没有问题。

我想要什么:

上面的例子只是一个位置,但我有一个需要查找的数千个位置的列表,最终试图确定位置数据集中两点之间的路由距离。

如上所述,我可以创建一个循环,并一次为每个位置提交一个请求,但由于我有很多位置,我想知道是否有一种首选方法可以在一个调用中提交位置列表(或将其分组?(,这对HERE API更好,可以有效地获取数据。在黑暗中,我尝试了3个位置的测试:

theRequest <- "https://geocode.search.hereapi.com/v1/geocode?qq=postalCode=18615;country=United+States&qq=postalCode=L4T1G3;country=Canada&qq=postalCode=62521;country=United+States&apiKey=#REMOVED#"

但它没有起作用。也许这是不可能的,我只是不理解REST,但我希望尽可能有效地处理多个请求——既为我自己,也为HERE API服务。提前谢谢。

如果您想使用HERE Geocoding and Search API,则在数据中循环发送每个地址的单独GET请求是一种完全有效的方法,只需确保您不会超过您所拥有计划的每秒最大允许请求数(RPS((例如,Freemium计划中的Geocoding和Search API为5个RPS(;否则您的查询将导致错误代码429";请求太多";你必须再次发送它们。

或者,您可以使用HERE Batch Geocoder API,该API旨在通过单个API调用处理大型数据集(最多100万条记录用于地理编码或反向地理编码(。使用此服务包括3个步骤:

  1. 发送一个POST请求,其中包含要进行地理编码或反向地理编码的所有数据
  2. 调用状态终结点以监视您提交的作业的状态。您可能希望定期进行这种类型的调用,直到响应指示您的作业已完成,并且您的输出已准备好下载
  3. 下载您的结果

下面是一个如何使用此服务的示例;请注意,此API支持POST请求,而不是GET。

用示例回答

astro.comma的答案指出了我需要去哪里才能获得批量的HERE API——严格来说,这就是答案以及为什么它被标记为这样。对于那些稍后经过这里的人来说,这是我的测试脚本,我用它来了解如何在R中实现请求,这是基于我从astro.comma获得的帮助

示例数据:

控制台:

df_locations[1:5,]  # Show a sample of the data in the data frame
># A tibble: 5 x 3
>  recID country postalCode
>  <int> <fct>   <chr>     
>1     1 CAN     L4T1G3    
>2     2 USA     62521     
>3     3 CAN     H9P1K2    
>4     4 CAN     L6S4K6    
>5     5 USA     52632     
dput(df_locations[1:5,])  # For ease of reproducibility, here's dput():

structure(list(recID = 1:5, country = structure(c(1L, 2L, 1L, 
1L, 2L), .Label = c("CAN", "USA", "MEX"), class = "factor"), 
postalCode = c("L4T1G3", "62521", "H9P1K2", "L6S4K6", "52632"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

脚本:

library(httr)
library(tidyverse)

HERE_API_KEY <- "YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE"
url <- "https://batch.geocoder.ls.hereapi.com/6.2/jobs"
# Write df_locations to pipe-delimited text file to prep for POST
write.table(
df_locations,
file = "locations.txt",
quote = FALSE,
sep = "|",
row.names = FALSE
)

# Assemble the POST request url to start the job
theRequest <-
paste0(
url,
"?&apiKey=",
HERE_API_KEY,
"&action=run&header=true",
"&indelim=%7C&outdelim=%7C",
"&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude",
"%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry",
"&outputCombined=true"
)
# Now submit the POST request along with the location file
theResponse <-
POST(url = theRequest, body = upload_file("locations.txt"))

控制台:

>theResponse
Response [https://batch.geocoder.ls.hereapi.com/6.2/jobs?&apiKey=YOU-CANT-HAVE-THIS-BECAUSE-ITS-MINE&action=run&header=true&indelim=%7C&outdelim=%7C&outcols=recId%2CseqNumber%2CseqLength%2CdisplayLatitude%2CdisplayLongitude%2Ccity%2CpostalCode%2Ccountry&outputCombined=true]
Date: 2021-12-27 00:45
Status: 200
Content-Type: application/json;charset=utf-8
Size: 209 B

脚本:

# Extract the Request ID so we can check for completion status of the job, and 
# use it to identify / download the zip file when complete.
reqID <- content(theResponse)$Response$MetaInfo$RequestId

控制台:

>reqID
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595"  # or looks like this -- I changed it here.

脚本:

# After letting some time pass (about a minute for my test file), I check
# status of the job with a GET request:
JOB_status <-
GET(paste0(url, "/", reqID, "?action=status&apiKey=", HERE_API_KEY))

控制台:

>content(JOB_status)
$Response
$Response$MetaInfo
$Response$MetaInfo$RequestId
[1] "XS9wSVt3y0Dch1Q48gX1xohewUKIw595"

$Response$Status
[1] "completed"         #  There are other statuses (statii?), but this one we care about.
$Response$JobStarted
[1] "2021-12-27T00:46:36.000+0000"
$Response$JobFinished
[1] "2021-12-27T00:46:49.000+0000"
$Response$TotalCount
[1] 2080                # Ignore this -- I only provided you with first 5 rows
$Response$ValidCount
[1] 2080
$Response$InvalidCount
[1] 0
$Response$ProcessedCount
[1] 2080
$Response$PendingCount
[1] 0
$Response$SuccessCount
[1] 2076
$Response$ErrorCount
[1] 4

脚本:

# I stayed with GET request via httr, but no reason you can't switch to some other
# method for download like cURL
COMPLETED_JOB <-
GET(paste0(url, "/", reqID, "/result?apiKey=", HERE_API_KEY))

job_content <- content(x = COMPLETED_JOB, as = "raw")  # This extract hexidecimal data which is the zipped content -- has to get extracted to be useful.
writeBin(job_content, con = "Processed_locations.zip")  # Writes the binary data to file.
unzip(zipfile = "Processed_locations.zip")  # Extracts the zip file as its own text file.

最终结果文件:

recId|SeqNumber|seqLength|recId|seqNumber|seqLength|displayLatitude|displayLongitude|city|postalCode|country
1|1|1|1|1|1|43.70924|-79.658|Mississauga|L4T 1G3|CAN
2|1|1|2|1|1|39.83972|-88.92881|Decatur|62521|USA
3|1|1|3|1|1|45.47659|-73.78061|Dorval|H9P 1K2|CAN
4|1|1|4|1|1|43.75666|-79.71021|Brampton|L6S 4K6|CAN
5|1|1|5|1|1|40.4013|-91.3848|Keokuk|52632|USA

最新更新