所以我有一个这样的数据集:
Customer_id Lat Lon
0. A 40 12
1. A np.nan np.nan
2. A np.nan np.nan
3. A 43 12
4. A 45 13
5. B 43 14
6. B np.nan np.nan
7. B 43 16
其中坐标(40,12(、(43,12(、(45,13(、(43,14(和(43,16(是某个网络的手机信号塔。
然后我应用一些插值函数,结果如下:
Customer_id Lat Lon
0. A 40 12
1. A 41 12
2. A 42 12
3. A 43 12
4. A 45 13
5. B 43 14
6. B 43 15
7. B 43 16
但这些新坐标只是估计值,而不是实际的塔楼。然后,我想将这些估计值分配给最近的实际塔,以便例如将记录 1 分配给塔 (40,12(。
我使用了这个代码
def haversine_closest_changed(towers, row):
all_points= towers
lat2= all_points[:,0] #the actual latitudes of the towers
lon2= all_points[:,1] #the actual longitudes of the towers
l=len(lat2) #how many towers are there
lat1=row['Expected_Lat'] #make a column with the actual latitude my value and all the towers,
#the point I'm looking at multiple times
lon1=row['Expected_Lon'] #find the min distance and output the minimum
lat1, lon1, lat2, lon2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlon/2.0)**2 + np.cos(lon1) * np.cos(lon2) * np.sin(dlat/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
idx=np.argmin(km)
closest_point=towers[idx,]
return closest_point
其中 towers 是一个 pandas 数据集,其中包含网络中存在的所有塔(一列用于纬度,另一列用于经度(,Expected_Lat和Expected_Lon列是我在进行插值后所说的列。
这段代码只返回 1 个纬度值和 1 个经度值,在整个列中重复。如何更改此代码以仅替换我插值的点/最近的塔之前是 NaN 的点?
首先,我们标记要插值的行,然后插值,最后,通过从这个 SO 答案最接近的距离计算,我们找到所有插值条目的最接近的实际塔:
import pandas as pd
import io
from math import cos, asin, sqrt
s=""" Customer_id Lat Lon
0. A 40 12
1. A np.nan np.nan
2. A np.nan np.nan
3. A 43 12
4. A 45 13
5. B 43 14
6. B np.nan np.nan
7. B 43 16
"""
df = pd.read_csv(io.StringIO(s), na_values='np.nan', sep='ss+', engine='python')
df['Interpolated'] = df.Lat.isnull()
df = df.interpolate()
towers = df.loc[~df.Interpolated,['Lat','Lon']].drop_duplicates().values
def distance(lat1, lon1, lat2, lon2):
p = 0.017453292519943295
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
return 12742 * asin(sqrt(a))
def closest_tower(row):
return min(towers, key=lambda p: distance(row.Lat, row.Lon, p[0], p[1]))
df.loc[df.Interpolated,['Lat','Lon']] = df.loc[df.Interpolated,['Lat','Lon']].apply(closest_tower ,axis=1)
结果:
Customer_id Lat Lon Interpolated
0.0 A 40.0 12.0 False
1.0 A 40.0 12.0 True
2.0 A 43.0 12.0 True
3.0 A 43.0 12.0 False
4.0 A 45.0 13.0 False
5.0 B 43.0 14.0 False
6.0 B 43.0 14.0 True
7.0 B 43.0 16.0 False