选择熊猫数据帧中保存的闭合坐标的最高成员



我有一个数据帧,它有以下列:X和Y是笛卡尔坐标,Value是元素在这些坐标下的值。我想要实现的是从n中只选择一个与另一个接近的坐标,比如说,如果距离低于某个值m,坐标就很接近,所以初始DF看起来像这样(示例(:

data = {'X':[0,0,0,1,1,5,6,7,8],'Y':[0,1,4,2,6,5,6,4,8],'Value':[6,7,4,5,6,5,6,4,8]}
df = pd.DataFrame(data)
X  Y  Value
0   0  0      6
1   0  1      7
2   0  4      4
3   1  2      5
4   1  6      6
5   5  5      5
6   6  6      6
7   7  4      4
8   8  8      8

距离通过以下函数计数:

def countDistance(lat1, lon1, lat2, lon2):
#use basic knowledge about triangles - values are in meters
distance = sqrt(pow(lat1-lat2,2)+pow(lon1-lon2,2))
return distance

比方说,如果我们想要m<=3,输出数据帧将如下所示:

X  Y  Value
1   0  1      7
4   1  6      6
8   8  8      8

要做什么:

rows 0,1,3 are close, highest value is in row 1, continue
rows 2 and 4 (from  original df) are close, keep row 4
rows 5,6,7 are close, keep row 6
left over row 6 is close to row 8, keep row 8, has higher value

所以我需要逐行浏览数据帧,检查其余部分,选择最佳匹配,然后继续。我想不出任何简单的方法来实现这一点,这不可能是drop_duplicates的用例,因为它们不是重复的,但在整个DF上循环将非常低效。我可以考虑的一种方法是只循环一次,对于每一行找到接近的行(可能应用countdistance(((,选择最合适的行并用其值替换rest,最终使用drop_duplicates。另一个想法是创建一个递归函数,该函数将创建一个新的DF,然后,当原始DF将首先选择行,找到接近的行,最佳匹配追加到新DF,从原始DF中删除第一行和所有接近的行并继续到空,然后用新DF返回相同的函数,以删除可能未捕获的接近点。

这些想法都是低效的,有没有一种既好又高效的蟒蛇方式来实现这一点?

目前,我已经创建了带有递归的简单代码,该代码可以工作,但很可能不是最优的。

def recModif(self,df):
#columns=['','X','Y','Value']
new_df = df.copy()
new_df = new_df[new_df['Value']<0] #create copy to work with
changed = False
while not df.empty: #for all the data
df = df.reset_index(drop=True) #need to reset so 0 is always accessible
x = df.loc[0,'X'] #first row x and y
y = df.loc[0,'Y']
df['dist'] = self.countDistance(x,y,df['X'],df['Y']) #add column with distances
select = df[df['dist']<10] #number of meters that two elements cant be next to other 
if(len(select.index)>1): #if there is more than one elem close
changed = True
#print(select,select['Value'].idxmax())
select = select.loc[[select['Value'].idxmax()]] #get the highest one
new_df = new_df.append(pd.DataFrame(select.iloc[:,:3]),ignore_index=True) #add it to new df
df = df[df['dist'] >= 10] #drop the elements now
if changed:
return self.recModif(new_df) #use recursion if possible overlaps
else: 
return new_df #return new df if all was OK

最新更新