如何使用谷歌地图更快地进行地理编码



这是我从CSV文件中的位置地址提取纬度和经度的代码。

import pandas as pd
import requests
import json
import time
GOOGLE_MAPS_API_URL = 'https://maps.googleapis.com/maps/api/geocode/json'
API_key= 'the-key'
def gmaps_geoencoder(address):
req = requests.get(GOOGLE_MAPS_API_URL+'?address='+address+'&key='+API_key)
res = req.json()
result = res['results'][0]
lat = result['geometry']['location']['lat']
lon = result['geometry']['location']['lng']
return lat, lon
input_csv_file = r'pathtolocation_list_100.csv'
output_csv_file = r'pathtolocation_list_100_new.csv'
df = pd.read_csv(input_csv_file)
#size of chunks of data to write to the csv
chunksize = 10
t = time.time()
for i in range(len(df)):
place = df['ADDRESS'][i]
lat, lon, res = gmaps_geoencoder(place)
df['Lat'][i] = lat
df['Lon'][i] = lon
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize) #size of data to append for each loop
print('Time taken: '+str(time.time() - t)+'s')

47.75818920135498s用了100条记录。也就是说,每条记录大约0.5秒。如何使它更快?我有大约100万条记录要转换,按照这个速度,完成这个过程几乎需要6天!!!这里花时间的是什么:在数据帧中迭代,还是使用gmapsneneneba API获取数据?如果是前者,我想应该有办法让它更快。但如果是后者,有什么解决办法吗?

而不是

for i in range(len(df)):
place = df['ADDRESS'][i]
lat, lon, res = gmaps_geoencoder(place)
df['Lat'][i] = lat
df['Lon'][i] = lon
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize)

使用这个

df[['Lat', 'Lon', 'res']] = pd.DataFrame(df['ADDRESS'].apply(lambda x: gmaps_geoencoder(x)).values.tolist())
df.to_csv(output_csv_file,
index=False,
chunksize=chunksize)

请参阅此链接了解更多信息

相关内容

  • 没有找到相关文章

最新更新