我有一组不同个人的位置坐标,以及另一组不同选票投递箱的坐标。我在查他们住处和最近的dropbox之间的距离。我附上了一份我现在必须处理的代码副本——它是从另一个堆栈溢出示例复制的。然而,它不是很有效,因为我正在使用的数据集有数百万行,代码依赖于找到所有可能的坐标组合,然后拉出最小的距离。有没有更有效的方法来处理这个问题?
我现在拥有的:
# Made-Up Data
library(geosphere)
library(tidyverse)
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26,78901))
# Code to find the distance between voters, and the dropoff boxes
# Order into a newdf as needed first.
# First, the voters:
voter_addresses <- data.frame(voter_id = as.character(geo_voters$voter_id),
lon_address = geo_voters$long,
lat_address = geo_voters$lat
)
# Second, the polling locations:
polling_address <- data.frame(place_number = 1:nrow(geo_dropoff_boxes),
lon_place = geo_dropoff_boxes$long,
lat_place = geo_dropoff_boxes$lat
)
# Create nested dfs:
voter_nest <- nest(voter_addresses, -voter_id, .key = 'voter_coords')
polling_nest <- nest(polling_address, -place_number, .key = 'polling_coords')
# Combine for combinations:
data_master <- crossing(voter_nest, polling_nest)
# Calculate shortest distance:
shortest_dist <- data_master %>%
mutate(dist = map2_dbl(voter_coords, polling_coords, distm)) %>%
group_by(voter_id) %>%
filter(dist == min(dist)) %>%
mutate(dist_km = dist/1000,
voter_id = as.character(voter_id)) %>%
select(voter_id, dist_km)
sf
包使这很简单。st_as_sf()
函数将latlong值的数据帧转换为地理参考点,st_distance()
函数计算它们之间的距离。在运行st_as_sf()
时,需要指定一个坐标参考系统。看起来您正在使用纬度和经度,所以我指定crs="epsg:4326"
,这是最常用的纬度/经度引用。
library( sf )
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26.78901))
# convert the data to sf features
geo_voters = st_as_sf( geo_voters, coords=c('long', 'lat'), crs="epsg:4326" )
geo_dropoff_boxes = st_as_sf( geo_dropoff_boxes, coords=c('long', 'lat'), crs="epsg:4326" )
# calculate the distances between voters and drop boxes
dist = st_distance( geo_voters, geo_dropoff_boxes )
print(dist)
现在每一行代表一个选民,每一列代表他们到投注箱的距离(单位为米):
Units: [m]
[,1] [,2] [,3]
[1,] 5866.745 18821.87 2482400
[2,] 3461.945 17813.57 2483210
[3,] 20916.618 14641.09 2462186