使用 geopy 返回经度/纬度的邮政编码 - 避免地理编码器超时:("服务超时"、"在索引处发生...")



这个问题并不新鲜,已经讨论过很多次了,但我是Python的新手。

Geopy 太慢 - 一直超时

Python geopy 地理编码器中的超时错误

我有一个包含 11000 个地理位置的数据集,并希望拥有它们的邮政编码。

我的数据如下所示:

Longitude   Latitude
0 -87.548627  41.728184
1 -87.737227  41.749111
2 -87.743974  41.924143
3 -87.659294  41.869314
4 -87.727808  41.877007

使用这个问题,我编写了一个函数,它适用于前 10-20 行,但给出了超时错误。

# Create a function for zip codes extraction
def get_zipcode(df, geolocator, lat_field, lon_field):
location = geolocator.reverse((df[lat_field], df[lon_field]))
return location.raw['address']['postcode']
geolocator = geopy.Nominatim(user_agent = 'my-application')
# Test a sample with 20 rows
test = bus_stops_geo.head(20)
# Extract zip codes for the sample
zipcodes = test.apply(get_zipcode, axis = 1, geolocator = geolocator, 
lat_field = 'Latitude', lon_field = 'Longitude')
print(zipcodes)
0     60617
1     60652
2     60639
3     60607
4     60644
5     60659
6     60620
7     60626
8     60610
9     60660
10    60625
11    60645
12    60628
13    60620
14    60629
15    60628
16    60644
17    60638
18    60657
19    60631
dtype: object

我尝试更改超时时间,但到目前为止失败了。

我的问题:

  • 如何为 11000 行实现此目的?
  • 如何重写此函数并不仅返回zips,还返回初始long和lat?
  • 在编程语言(如R(或使用专有软件(付费选项对我有用(中,是否有任何简单的替代解决方案?

非常感谢任何帮助!

geopy 与熊猫的用法在文档中描述:https://geopy.readthedocs.io/en/1.22.0/#usage-with-pandas

使用geopy的解决方案:

In [1]: import pandas as pd
...:
...: df = pd.DataFrame([
...:     [-87.548627, 41.728184],
...:     [-87.737227, 41.749111],
...:     [-87.743974, 41.924143],
...:     [-87.659294, 41.869314],
...:     [-87.727808, 41.877007],
...: ], columns=["Longitude", "Latitude"])
In [2]: from tqdm import tqdm
...: tqdm.pandas()
...:
...: from geopy.geocoders import Nominatim
...: geolocator = Nominatim(user_agent="specify_your_app_name_here")
...:
...: from geopy.extra.rate_limiter import RateLimiter
...: reverse = RateLimiter(geolocator.reverse, min_delay_seconds=1)
In [3]: df['Location'] = df.progress_apply(
...:     lambda row: reverse((row['Latitude'], row['Longitude'])),
...:     axis=1
...: )
100%|█████████████████████████████| 5/5 [00:06<00:00,  1.24s/it]
In [4]: def parse_zipcode(location):
...:     if location and location.raw.get('address') and location.raw['address'].get('postcode'):
...:         return location.raw['address']['postcode']
...:     else:
...:         return None
...: df['Zipcode'] = df['Location'].apply(parse_zipcode)
In [5]: df
Out[5]:
Longitude   Latitude                                           Location Zipcode
0 -87.548627  41.728184  (Olive Harvey College South Chicago Learning C...   60617
1 -87.737227  41.749111  (7900, South Kilpatrick Avenue, Chicago, Cook ...   60652
2 -87.743974  41.924143  (4701, West Fullerton Avenue, Beat 2522, Belmo...   60639
3 -87.659294  41.869314  (1301-1307, West Taylor Street, Near West Side...   60607
4 -87.727808  41.877007  (4053, West Jackson Boulevard, West Garfield P...   60644

如果付费选项适合您,请考虑使用免费 Nominatim 以外的任何其他地理编码服务,例如 MapQuest 或 PickPoint。

这不是一个 Python 解决方案和 shapefile 的不同方法,但为了 SO 社区的好处,我将发布一个带有 R 的答案(来自 @Jaap 和这个答案的提示(。

首先,您需要美国邮政编码形状文件(用于美国位置(。

# Install packages
library(sp)
library(rgdal)
library(tidyverse)
# Import zips shapefile and transform CRS 
zips <- readOGR("cb_2015_us_zcta510_500k.shp")
zips <- spTransform(zips, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# Read bus stops data
bus_stops <- read_csv("CTA_BusStops.csv")
# A tibble: 6 x 14
SYSTEMSTOP OBJECTID the_geom                            STREET     CROSS_ST          
DIR   POS   ROUTESSTPG OWLROUTES CITY   STATUS PUBLIC_NAM          POINT_X POINT_Y
<dbl>    <dbl> <chr>                               <chr>      <chr>             
<chr> <chr> <chr>      <chr>     <chr>   <dbl> <chr>                 <dbl>   <dbl>
1      11953      193 POINT (-87.54862703700002 41.72818~ 92ND STRE~ BALTIMORE         
EB    NS    95         NA        CHICA~      1 92nd Street & Balt~   -87.5    41.7
2       2723      194 POINT (-87.737227163 41.7491110710~ 79TH STRE~ KILPATRICK (east~ 
EB    NS    79         NA        CHICA~      1 79th Street & Kilp~   -87.7    41.7
3       1307      195 POINT (-87.74397362600001 41.92414~ FULLERTON  KILPATRICK        
EB    NS    74         NA        CHICA~      1 Fullerton & Kilpat~   -87.7    41.9
4       6696      196 POINT (-87.65929365400001 41.86931~ TAYLOR     THROOP            
EB    NS    157        NA        CHICA~      1 Taylor & Throop       -87.7    41.9
5         22      197 POINT (-87.72780787099998 41.87700~ JACKSON    KARLOV            
EB    FS    126        NA        CHICA~      1 Jackson & Karlov      -87.7    41.9
6       4767      198 POINT (-87.71978000000001 41.97552~ FOSTER     MONTICELLO        
EB    NS    92         NA        CHICA~      1 Foster & Monticello   -87.7    42.0
# Rename two columns for long and lat
bus_stops <- rename(bus_stops, longitude = POINT_X, latitude = POINT_Y)
# Extract long and lat columns only
test_bus_stops <- bus_stops[ , c(13, 14)]
# Transform coordinates into a SpatialPointsDataFrame
bus_stops_spatial <- SpatialPointsDataFrame(coords = test_bus_stops, data = bus_stops, 
proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
class(bus_stops_spatial) # [1] "SpatialPointsDataFrame"
# Subset only the zipcodes in which points are found
zips_subset <- zips[bus_stops_spatial, ]
# NOTE: the column in zips_subset containing zipcodes is ZCTA5CE10
# Use over() to overlay points in polygons and then add that to the original dataframe
# Create a separate column
bus_stops_zip <- over(bus_stops_spatial, zips_subset[, "ZCTA5CE10"]) 
bus_stops_zip 

我通过此链接随机仔细检查了结果。它匹配。

# Merge bus stops with bus stop zips
bus_stops <- cbind(bus_stops, bus_stops_zip)
# Rename zip_code column
bus_stops <- rename(bus_stops, zip_code = ZCTA5CE10)
head(bus_stops)
# Display zip
head(bus_stops[, 13:15])
longitude latitude zip_code
1 -87.54863 41.72818    60617
2 -87.73723 41.74911    60652
3 -87.74397 41.92414    60639
4 -87.65929 41.86931    60607
5 -87.72781 41.87701    60624
6 -87.71978 41.97553    60625
# Extract as csv
write.csv(bus_stops,'bus_stops_zips.csv')

最新更新