通过每个城市的所有可能组合来计算5个城市之间的地理距离



所以我有一个CSV文件,该文件由3列(城市,纬度,经度(组成我已经使用此代码从此CSV文件中在Python中创建了一个数据框

data = pd.read_csv("lat_long.csv",nrows=10)
Lat = data.lat.tolist()
Lon = data.lon.tolist()
suburb = data.suburb.tolist()
dict={'Latitude':Lat,'Longitude':Lon}
df = pd.DataFrame(dict,index=(suburb))

,输出是此

                                 Latitude   Longitude
AUSTRALIAN NATIONAL UNIVERSITY -35.277272  149.117136
BARTON                         -35.201372  149.095065
DARWIN                         -12.801028  130.955789
DARWIN                         -12.801028  130.955789
PARAP                          -12.432181  130.843310
ALAWA                          -12.378451  130.877014
BRINKIN                        -12.367769  130.869808
CASUARINA                      -12.376597  130.850489
JINGILI                        -12.385761  130.873726
LEE POINT                      -12.360865  130.891349

现在我想要的是从1个城市到其他9个城市的距离的所有可能组合。它应该看起来像

                                              DISTANCE
AUSTRALIAN NATIONAL UNIVERSITY- BARTON
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - PARAP

我尝试使用嵌套以进行循环进行操作,但我想要更快。

i从dataframe

开始
    city         Latitude   Longitude
0   AUSTRAL.    -35.277272  149.117136
1   BARTON      -35.201372  149.095065
2   DARWIN      -12.801028  130.955789
3   DARWIN      -12.801028  130.955789
4   PARAP       -12.432181  130.843310
5   ALAWA       -12.378451  130.877014
6   BRINKIN     -12.367769  130.869808
7   CASUARINA   -12.376597  130.850489
8   JINGILI     -12.385761  130.873726
9   LEE_POINT   -12.360865  130.891349

并创建新列,这只是创建我们通过将数据框合并为自己而获得的笛卡尔产品的帮助者。

df['join'] = 1
df_joined = pd.merge(df, df,on='join')
df_joined['haversine_dist'] = df_joined.apply(lambda x: haversine((x.Latitude_x, x.Longitude_x),(x.Latitude_y,x.Longitude_y)), 1)

结果(仅前5列(

    city_x      Latitude_x  Longitude_x join city_y Latitude_y  Longitude_y haversine_dist
0   AUSTRAL.    -35.277272  149.117136  1   AUSTRAL.    -35.277272  149.117136  0.000000
1   AUSTRAL.    -35.277272  149.117136  1   BARTON  -35.201372  149.095065  8.674473
2   AUSTRAL.    -35.277272  149.117136  1   DARWIN  -12.801028  130.955789  3093.972598
3   AUSTRAL.    -35.277272  149.117136  1   DARWIN  -12.801028  130.955789  3093.972598
4   AUSTRAL.    -35.277272  149.117136  1   PARAP   -12.432181  130.843310  3135.034018
5   AUSTRAL.    -35.277272  149.117136  1   ALAWA   -12.378451  130.877014  3138.077950

为了测试,我手工构造了原始数据框

import pandas as pd 
import itertools
from haversine import haversine
x = {'city':['AUSTRALIAN NATIONAL UNIVERSITY', 'BARTON', 'DARWIN', 'DARWIN', 'PARAP', 'ALAWA', 'BRINKIN', 'CASUARINA', 'JINGILI', 'LEE_POINT' ]}
la = {'Latitude':[-35.277272,-35.201372, -12.801028 , -12.801028, -12.432181, -12.378451, -12.367769, -12.376597, -12.385761, -12.360865]}
lo = {'Longitude':[149.117136,149.095065, 130.955789 , 130.955789, 130.843310,  130.877014, 130.869808, 130.850489, 130.873726, 130.891349]}
data = {**x, **la, **lo}
df = pd.DataFrame(data)

放弃重复。

df = df.drop_duplicates()

列出所有城市。

city = list(df["city"])

结合其中两个

TwoCity = list(itertools.combinations(city, 2))

构建新的DataFrame

df1 = pd.DataFrame({'TwoCity':TwoCity})
df1['Distance(km)'] = df1.apply(lambda row: 
          haversine((df[df['city']==row.TwoCity[0]]['Latitude'], df[df['city']==row.TwoCity[0]]['Longitude']),
                    (df[df['city']==row.TwoCity[1]]['Latitude'], df[df['city']==row.TwoCity[1]]['Longitude'])),axis=1)
print(df1.to_string(index=False))

df1的最终结果是(手工调整一点(:

   TwoCity                                     Distance(km)
   (AUSTRALIAN NATIONAL UNIVERSITY, BARTON)      8.674473
   (AUSTRALIAN NATIONAL UNIVERSITY, DARWIN)   3093.972598
    (AUSTRALIAN NATIONAL UNIVERSITY, PARAP)   3135.034018
    (AUSTRALIAN NATIONAL UNIVERSITY, ALAWA)   3138.077950
  (AUSTRALIAN NATIONAL UNIVERSITY, BRINKIN)   3139.500311
(AUSTRALIAN NATIONAL UNIVERSITY, CASUARINA)   3139.808790
  (AUSTRALIAN NATIONAL UNIVERSITY, JINGILI)   3137.587038
(AUSTRALIAN NATIONAL UNIVERSITY, LEE_POINT)   3138.882795
                           (BARTON, DARWIN)   3086.264122
                            (BARTON, PARAP)   3127.309536
                            (BARTON, ALAWA)   3130.345201
                          (BARTON, BRINKIN)   3131.767583
                        (BARTON, CASUARINA)   3132.079061
                          (BARTON, JINGILI)   3129.855257
                        (BARTON, LEE_POINT)   3131.146957
                            (DARWIN, PARAP)     42.791471
                            (DARWIN, ALAWA)     47.759804
                          (DARWIN, BRINKIN)     49.071577
                        (DARWIN, CASUARINA)     48.558395
                          (DARWIN, JINGILI)     47.026561
                        (DARWIN, LEE_POINT)     49.441057
                             (PARAP, ALAWA)      7.006568
                           (PARAP, BRINKIN)      7.718791
                         (PARAP, CASUARINA)      6.229645
                           (PARAP, JINGILI)      6.128079
                         (PARAP, LEE_POINT)      9.492285
                           (ALAWA, BRINKIN)      1.422460
                         (ALAWA, CASUARINA)      2.888261
                           (ALAWA, JINGILI)      0.887821
                         (ALAWA, LEE_POINT)      2.499614
                       (BRINKIN, CASUARINA)      2.316553
                         (BRINKIN, JINGILI)      2.045378
                       (BRINKIN, LEE_POINT)      2.462424
                       (CASUARINA, JINGILI)      2.721699
                     (CASUARINA, LEE_POINT)      4.770298
                       (JINGILI, LEE_POINT)      3.365596

最新更新