我有来自两个不同目录的数据,我想使用坐标来匹配这两个目录。我拥有的数据是目录1中的x1,y1,z1,a1,b1,c1,etc
(约500万个元素),以及目录2的x2,y2,z2,a2,e2,m2,n2,etc
(约有百万个元素)。必要的我将扩展到(x,y,z),并比较2D阵列以找到相同的元素。
co1 = np.vstack((x1,y1)).T
co2 = np.vstack((x2,y2)).T
idx1 = np.in1d(co1,co2) # not working for 2D arrays
idx2 = np.in1d(co2,co1)
np.savetxt('combined_data.txt',np.c_[x1[idx1],y1[idx1],a1[idx1],e2[idx2],n2[idx2]],fmt='%1.4f %1.4f %1.4f %1.4f %1.4f')
例如,我有以下数据集:
x1 = np.array([1,2,3,4,5])
y1 = np.array([5,4,3,2,1])
x2 = np.array([1,4,6,2,6,4,8,9,3])
y2 = np.array([5,1,5,3,6,2,8,3,3])
(1,5), (3,3), (4,2) are the common coordinates between the two catalogs. Therefore,
idx1 = [Ture, False, True, True, False], idx2 = [True, False, False, False, False, True, False, False, True].
,但问题是np.in1d
是1D例程,不能将其应用于2D或3D数组。有人知道完成此任务的一些Numpy例程吗?
将两个数组转换为pandas dataframes:
df1 = pd.DataFrame({"x" : x1, "y" : y1})).reset_index()
合并它们:
result = pd.merge(df1, df2, left_on=["x","y"], right_on=["x","y"])
# index_x x y index_y
#0 0 1 5 0
#1 2 3 3 8
#2 3 4 2 5
获取索引:
result[["index_x","index_y"]]
# index_x index_y
#0 0 0
#1 2 8
#2 3 5