所以我有一个CSV文件,该文件由3列(城市,纬度,经度(组成我已经使用此代码从此CSV文件中在Python中创建了一个数据框
data = pd.read_csv("lat_long.csv",nrows=10)
Lat = data.lat.tolist()
Lon = data.lon.tolist()
suburb = data.suburb.tolist()
dict={'Latitude':Lat,'Longitude':Lon}
df = pd.DataFrame(dict,index=(suburb))
,输出是此
Latitude Longitude
AUSTRALIAN NATIONAL UNIVERSITY -35.277272 149.117136
BARTON -35.201372 149.095065
DARWIN -12.801028 130.955789
DARWIN -12.801028 130.955789
PARAP -12.432181 130.843310
ALAWA -12.378451 130.877014
BRINKIN -12.367769 130.869808
CASUARINA -12.376597 130.850489
JINGILI -12.385761 130.873726
LEE POINT -12.360865 130.891349
现在我想要的是从1个城市到其他9个城市的距离的所有可能组合。它应该看起来像
DISTANCE
AUSTRALIAN NATIONAL UNIVERSITY- BARTON
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - PARAP
我尝试使用嵌套以进行循环进行操作,但我想要更快。
i从dataframe
开始 city Latitude Longitude
0 AUSTRAL. -35.277272 149.117136
1 BARTON -35.201372 149.095065
2 DARWIN -12.801028 130.955789
3 DARWIN -12.801028 130.955789
4 PARAP -12.432181 130.843310
5 ALAWA -12.378451 130.877014
6 BRINKIN -12.367769 130.869808
7 CASUARINA -12.376597 130.850489
8 JINGILI -12.385761 130.873726
9 LEE_POINT -12.360865 130.891349
并创建新列,这只是创建我们通过将数据框合并为自己而获得的笛卡尔产品的帮助者。
df['join'] = 1
df_joined = pd.merge(df, df,on='join')
df_joined['haversine_dist'] = df_joined.apply(lambda x: haversine((x.Latitude_x, x.Longitude_x),(x.Latitude_y,x.Longitude_y)), 1)
结果(仅前5列(
city_x Latitude_x Longitude_x join city_y Latitude_y Longitude_y haversine_dist
0 AUSTRAL. -35.277272 149.117136 1 AUSTRAL. -35.277272 149.117136 0.000000
1 AUSTRAL. -35.277272 149.117136 1 BARTON -35.201372 149.095065 8.674473
2 AUSTRAL. -35.277272 149.117136 1 DARWIN -12.801028 130.955789 3093.972598
3 AUSTRAL. -35.277272 149.117136 1 DARWIN -12.801028 130.955789 3093.972598
4 AUSTRAL. -35.277272 149.117136 1 PARAP -12.432181 130.843310 3135.034018
5 AUSTRAL. -35.277272 149.117136 1 ALAWA -12.378451 130.877014 3138.077950
为了测试,我手工构造了原始数据框
import pandas as pd
import itertools
from haversine import haversine
x = {'city':['AUSTRALIAN NATIONAL UNIVERSITY', 'BARTON', 'DARWIN', 'DARWIN', 'PARAP', 'ALAWA', 'BRINKIN', 'CASUARINA', 'JINGILI', 'LEE_POINT' ]}
la = {'Latitude':[-35.277272,-35.201372, -12.801028 , -12.801028, -12.432181, -12.378451, -12.367769, -12.376597, -12.385761, -12.360865]}
lo = {'Longitude':[149.117136,149.095065, 130.955789 , 130.955789, 130.843310, 130.877014, 130.869808, 130.850489, 130.873726, 130.891349]}
data = {**x, **la, **lo}
df = pd.DataFrame(data)
放弃重复。
df = df.drop_duplicates()
列出所有城市。
city = list(df["city"])
结合其中两个
TwoCity = list(itertools.combinations(city, 2))
构建新的DataFrame
df1 = pd.DataFrame({'TwoCity':TwoCity})
df1['Distance(km)'] = df1.apply(lambda row:
haversine((df[df['city']==row.TwoCity[0]]['Latitude'], df[df['city']==row.TwoCity[0]]['Longitude']),
(df[df['city']==row.TwoCity[1]]['Latitude'], df[df['city']==row.TwoCity[1]]['Longitude'])),axis=1)
print(df1.to_string(index=False))
df1
的最终结果是(手工调整一点(:
TwoCity Distance(km)
(AUSTRALIAN NATIONAL UNIVERSITY, BARTON) 8.674473
(AUSTRALIAN NATIONAL UNIVERSITY, DARWIN) 3093.972598
(AUSTRALIAN NATIONAL UNIVERSITY, PARAP) 3135.034018
(AUSTRALIAN NATIONAL UNIVERSITY, ALAWA) 3138.077950
(AUSTRALIAN NATIONAL UNIVERSITY, BRINKIN) 3139.500311
(AUSTRALIAN NATIONAL UNIVERSITY, CASUARINA) 3139.808790
(AUSTRALIAN NATIONAL UNIVERSITY, JINGILI) 3137.587038
(AUSTRALIAN NATIONAL UNIVERSITY, LEE_POINT) 3138.882795
(BARTON, DARWIN) 3086.264122
(BARTON, PARAP) 3127.309536
(BARTON, ALAWA) 3130.345201
(BARTON, BRINKIN) 3131.767583
(BARTON, CASUARINA) 3132.079061
(BARTON, JINGILI) 3129.855257
(BARTON, LEE_POINT) 3131.146957
(DARWIN, PARAP) 42.791471
(DARWIN, ALAWA) 47.759804
(DARWIN, BRINKIN) 49.071577
(DARWIN, CASUARINA) 48.558395
(DARWIN, JINGILI) 47.026561
(DARWIN, LEE_POINT) 49.441057
(PARAP, ALAWA) 7.006568
(PARAP, BRINKIN) 7.718791
(PARAP, CASUARINA) 6.229645
(PARAP, JINGILI) 6.128079
(PARAP, LEE_POINT) 9.492285
(ALAWA, BRINKIN) 1.422460
(ALAWA, CASUARINA) 2.888261
(ALAWA, JINGILI) 0.887821
(ALAWA, LEE_POINT) 2.499614
(BRINKIN, CASUARINA) 2.316553
(BRINKIN, JINGILI) 2.045378
(BRINKIN, LEE_POINT) 2.462424
(CASUARINA, JINGILI) 2.721699
(CASUARINA, LEE_POINT) 4.770298
(JINGILI, LEE_POINT) 3.365596