根据相邻坐标操作熊猫数据帧



我有以下数据框显示纬度 (y(、经度 (x( 和海拔 (z(。 负数表示没有海拔,即没有土地。

我想更改当前没有高程的任何点的高程,如果它们位于具有相同 y 处的高程的点旁边,但该 y 值大于下一个 y 向上的相同 x 处的点。

对这种解释表示歉意,我不确定这是描述它的最佳方式。但在本例中:点 (-2, 5( 具有高程 -999999。在该 y 值处,它紧挨着 (-1, 5(,高程为 67,大于 (x, y+1( = (-1, 6( 处的点,高程为 8。在这种情况下,我想将 (-2, 5( 的提升更改为 (-2, 6( = 9。

这:

index x    y       z
0  -5.0  5.0 -999999
1  -4.0  5.0 -999999
2  -3.0  5.0 -999999
3  -2.0  5.0 -999999
4  -1.0  5.0      67
5   0.0  5.0      55
6   1.0  5.0      49
7   2.0  5.0       7
8   3.0  5.0       6
9   4.0  5.0       6
10 -5.0  6.0      12
11 -4.0  6.0      12
12 -3.0  6.0      19
13 -2.0  6.0       9
14 -1.0  6.0       8
15  0.0  6.0       9
16  1.0  6.0       9
17  2.0  6.0       7
18  3.0  6.0       7
19  4.0  6.0       7

成为:

index x    y       z adjusted
0  -5.0  5.0 -999999        0    
1  -4.0  5.0 -999999        0
2  -3.0  5.0 -999999        0
3  -2.0  5.0       9        1
4  -1.0  5.0      67        0
5   0.0  5.0      55        0
6   1.0  5.0      49        0
7   2.0  5.0       7        0
8   3.0  5.0       6        0
9   4.0  5.0       6        0
10 -5.0  6.0      12        0
11 -4.0  6.0      12        0
12 -3.0  6.0      19        0
13 -2.0  6.0       9        0
14 -1.0  6.0       8        0
15  0.0  6.0       9        0
16  1.0  6.0       9        0
17  2.0  6.0       7        0
18  3.0  6.0       7        0
19  4.0  6.0       7        0

您如何操作这样的数据帧?

基于熊猫的解决方案。如果我没有正确理解您的调整逻辑,这应该很容易调整。

df = pd.read_clipboard()
# filter table by relevant (negative) z locations
df_neg = df.loc[df.z < 0]
# get coordinates of relevant locations
list_x, list_y = df_neg.x, df_neg.y
# get lists of neighboring points relative to relevant locations
points_right = list(zip(list_x + 1, list_y))
points_topright = list(zip(list_x + 1, list_y + 1))
points_top = list(zip(list_x, list_y + 1))
# set x, y index for convenient access and initialize adjusted col
df_idxd = df.set_index(['x', 'y']).assign(adjusted=0)
# add values of neighboring points to the df_neg table
# if one of the points in the points_... lists doesn't exist,
# the values will be NaN and it won't bother us below
df_neg['right'] = df_idxd.loc[points_right].z.values
df_neg['topright'] = df_idxd.loc[points_topright].z.values
df_neg['top'] = df_idxd.loc[points_top].z.values
# get mask which determines whether or not we update
mask = (df_neg.right >= 0) & (df_neg.right > df_neg.topright)
# update values in df_neg
df_neg['z'] = df_neg.z.where(~mask, df_neg.top)
df_neg['adjusted'] = mask.astype(int)
# use df_neg to update the full table
df_idxd.update(df_neg.set_index(['x', 'y']))
# restore original index
df_idxd.reset_index().set_index('index')

结果:

x    y         z  adjusted
index                              
0.0   -5.0  5.0 -999999.0       0.0
1.0   -4.0  5.0 -999999.0       0.0
2.0   -3.0  5.0 -999999.0       0.0
3.0   -2.0  5.0       9.0       1.0
4.0   -1.0  5.0      67.0       0.0
5.0    0.0  5.0      55.0       0.0
6.0    1.0  5.0      49.0       0.0
7.0    2.0  5.0       7.0       0.0
8.0    3.0  5.0       6.0       0.0
9.0    4.0  5.0       6.0       0.0
10.0  -5.0  6.0      12.0       0.0
11.0  -4.0  6.0      12.0       0.0
12.0  -3.0  6.0      19.0       0.0
13.0  -2.0  6.0       9.0       0.0
14.0  -1.0  6.0       8.0       0.0
15.0   0.0  6.0       9.0       0.0
16.0   1.0  6.0       9.0       0.0
17.0   2.0  6.0       7.0       0.0
18.0   3.0  6.0       7.0       0.0
19.0   4.0  6.0       7.0       0.0

这是我设法整理的内容:

df = pd.DataFrame({'x': np.concatenate([np.arange(-5, 5), np.arange(-5, 5)]),
'y': np.concatenate([np.repeat(5, 10), np.repeat(6, 10)]),
'z': [-999999, -999999, -999999, -999999, 67, 55, 49, 7,
6, 6, 12, 12, 19, 9, 8, 9, 9, 7, 7, 7]})

第一步是将数据帧转换为 2d numpy 矩阵并使用它,因为关系发生在 2d 平面中

vals = df.set_index(['x', 'y']).unstack().values

然后,计算要替换哪些值的掩码

mask_is_neg = vals < 0
mask_satisfies_ineq = np.pad(vals[1:, :-1] - vals[1:, 1:] > 0, ((0, 1), (0, 1)), mode='constant', constant_values=False)
mask = np.logical_and(mask_is_neg, mask_satisfies_ineq)

最后,将一个人获得的掩码沿 y 方向移动以遮盖我们将用于替换的值

mask_grab = np.pad(mask[:, :-1], ((0, 0), (1, 0)), mode='constant', constant_values=False)

替换以下值:

vals[mask] = vals[mask_grab]

重塑数组并计算调整后的列:

vals = vals.flatten('F')
adjusted = (vals != df.z).astype(int)

最后,将这些值放在原始数据框中:

df.z = vals
df['adjusted'] = adjusted