根据相邻坐标操作熊猫数据帧

我有以下数据框显示纬度 (y(、经度 (x( 和海拔 (z(。负数表示没有海拔，即没有土地。

我想更改当前没有高程的任何点的高程，如果它们位于具有相同 y 处的高程的点旁边，但该 y 值大于下一个 y 向上的相同 x 处的点。

对这种解释表示歉意，我不确定这是描述它的最佳方式。但在本例中：点 (-2， 5( 具有高程 -999999。在该 y 值处，它紧挨着 (-1， 5(，高程为 67，大于 (x， y+1( = (-1， 6( 处的点，高程为 8。在这种情况下，我想将 (-2， 5( 的提升更改为 (-2， 6( = 9。

这：

index x    y       z
0  -5.0  5.0 -999999
1  -4.0  5.0 -999999
2  -3.0  5.0 -999999
3  -2.0  5.0 -999999
4  -1.0  5.0      67
5   0.0  5.0      55
6   1.0  5.0      49
7   2.0  5.0       7
8   3.0  5.0       6
9   4.0  5.0       6
10 -5.0  6.0      12
11 -4.0  6.0      12
12 -3.0  6.0      19
13 -2.0  6.0       9
14 -1.0  6.0       8
15  0.0  6.0       9
16  1.0  6.0       9
17  2.0  6.0       7
18  3.0  6.0       7
19  4.0  6.0       7

成为：

index x    y       z adjusted
0  -5.0  5.0 -999999        0    
1  -4.0  5.0 -999999        0
2  -3.0  5.0 -999999        0
3  -2.0  5.0       9        1
4  -1.0  5.0      67        0
5   0.0  5.0      55        0
6   1.0  5.0      49        0
7   2.0  5.0       7        0
8   3.0  5.0       6        0
9   4.0  5.0       6        0
10 -5.0  6.0      12        0
11 -4.0  6.0      12        0
12 -3.0  6.0      19        0
13 -2.0  6.0       9        0
14 -1.0  6.0       8        0
15  0.0  6.0       9        0
16  1.0  6.0       9        0
17  2.0  6.0       7        0
18  3.0  6.0       7        0
19  4.0  6.0       7        0

您如何操作这样的数据帧？

基于熊猫的解决方案。如果我没有正确理解您的调整逻辑，这应该很容易调整。

df = pd.read_clipboard()
# filter table by relevant (negative) z locations
df_neg = df.loc[df.z < 0]
# get coordinates of relevant locations
list_x, list_y = df_neg.x, df_neg.y
# get lists of neighboring points relative to relevant locations
points_right = list(zip(list_x + 1, list_y))
points_topright = list(zip(list_x + 1, list_y + 1))
points_top = list(zip(list_x, list_y + 1))
# set x, y index for convenient access and initialize adjusted col
df_idxd = df.set_index(['x', 'y']).assign(adjusted=0)
# add values of neighboring points to the df_neg table
# if one of the points in the points_... lists doesn't exist,
# the values will be NaN and it won't bother us below
df_neg['right'] = df_idxd.loc[points_right].z.values
df_neg['topright'] = df_idxd.loc[points_topright].z.values
df_neg['top'] = df_idxd.loc[points_top].z.values
# get mask which determines whether or not we update
mask = (df_neg.right >= 0) & (df_neg.right > df_neg.topright)
# update values in df_neg
df_neg['z'] = df_neg.z.where(~mask, df_neg.top)
df_neg['adjusted'] = mask.astype(int)
# use df_neg to update the full table
df_idxd.update(df_neg.set_index(['x', 'y']))
# restore original index
df_idxd.reset_index().set_index('index')

结果：

x    y         z  adjusted
index                              
0.0   -5.0  5.0 -999999.0       0.0
1.0   -4.0  5.0 -999999.0       0.0
2.0   -3.0  5.0 -999999.0       0.0
3.0   -2.0  5.0       9.0       1.0
4.0   -1.0  5.0      67.0       0.0
5.0    0.0  5.0      55.0       0.0
6.0    1.0  5.0      49.0       0.0
7.0    2.0  5.0       7.0       0.0
8.0    3.0  5.0       6.0       0.0
9.0    4.0  5.0       6.0       0.0
10.0  -5.0  6.0      12.0       0.0
11.0  -4.0  6.0      12.0       0.0
12.0  -3.0  6.0      19.0       0.0
13.0  -2.0  6.0       9.0       0.0
14.0  -1.0  6.0       8.0       0.0
15.0   0.0  6.0       9.0       0.0
16.0   1.0  6.0       9.0       0.0
17.0   2.0  6.0       7.0       0.0
18.0   3.0  6.0       7.0       0.0
19.0   4.0  6.0       7.0       0.0

这是我设法整理的内容：

df = pd.DataFrame({'x': np.concatenate([np.arange(-5, 5), np.arange(-5, 5)]),
'y': np.concatenate([np.repeat(5, 10), np.repeat(6, 10)]),
'z': [-999999, -999999, -999999, -999999, 67, 55, 49, 7,
6, 6, 12, 12, 19, 9, 8, 9, 9, 7, 7, 7]})

第一步是将数据帧转换为 2d numpy 矩阵并使用它，因为关系发生在 2d 平面中

vals = df.set_index(['x', 'y']).unstack().values

然后，计算要替换哪些值的掩码

mask_is_neg = vals < 0
mask_satisfies_ineq = np.pad(vals[1:, :-1] - vals[1:, 1:] > 0, ((0, 1), (0, 1)), mode='constant', constant_values=False)
mask = np.logical_and(mask_is_neg, mask_satisfies_ineq)

最后，将一个人获得的掩码沿 y 方向移动以遮盖我们将用于替换的值

mask_grab = np.pad(mask[:, :-1], ((0, 0), (1, 0)), mode='constant', constant_values=False)

替换以下值：

vals[mask] = vals[mask_grab]

重塑数组并计算调整后的列：

vals = vals.flatten('F')
adjusted = (vals != df.z).astype(int)

最后，将这些值放在原始数据框中：

df.z = vals
df['adjusted'] = adjusted

相关内容

最新更新

热门标签：