我有两个数据帧,其中两列非常重要。其中一列由 float64 值组成,另一列由字符串组成。 数据帧的大小不同。
我想同时匹配Number
列和Item
列,然后只得到匹配的列。
df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})
df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})
df1
Number Item
0 1.0 Phone
1 3.0 Watch
2 4.0 Pen
3 5.0 Pencil
4 8.0 Pencil
5 12.0 toolkit
6 32.0 box
7 58.0 fork
df2
Number Item
0 3.0 Watch
1 4.0 Pen
2 8.0 Pencil
3 12.0 Eraser
4 15.0 bottle
5 32.0 box
6 54.0 toolkit
7 58.0 fork
8 72.0 Phone
我正在尝试使用 forloop,循环很长。这似乎是实现这一目标的非常糟糕的方法。我正在尝试使用掩码操作,但不确定如何实现这一点。感谢帮助以尽可能短的方式执行此操作。
所需的结果应如下所示:
Item Matching Number
0 Phone No Match 1.0
1 Watch Matched 3.0
2 Pen Matched 4.0
3 Pencil No Match 5.0
4 Pencil Matched 8.0
5 toolkit No Match 12.0
6 box Matched 32.0
7 fork Matched 58.0
如果合并浮点值有问题,则可能乘以1000
并转换为整数,然后使用左连接merge
,因为匹配应该存在问题,因此两列中的浮点精度应该不同:
df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)
df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})
df = df.drop(['Number1','_merge'], axis=1)
print (df)
Number Item Matching
0 1.0 Phone No Match
1 3.0 Watch Match
2 4.0 Pen Match
3 5.0 Pencil No Match
4 8.0 Pencil Match
5 12.0 toolkit No Match
6 32.0 box Match
7 58.0 fork Match
您正在寻找与indicator=True
的左合并:
res = pd.merge(df1, df2, how='left', indicator=True)
print(res)
Item Number _merge
0 Phone 1.0 left_only
1 Watch 3.0 both
2 Pen 4.0 both
3 Pencil 5.0 left_only
4 Pencil 8.0 both
5 toolkit 12.0 left_only
6 box 32.0 both
7 fork 58.0 both
通常,当专用方法可用时,应避免显式for
循环,因为这些方法通常针对性能进行了优化。如果您愿意,您可以通过字典映射替换字符串:
d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)
您可以通过一些简单的loc
和isin
来获得所需的数据帧,如下所示
df = df1.copy()
df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)
Number Item Matching
1.0 Phone No Match
3.0 Watch Matched
4.0 Pen Matched
5.0 Pencil No Match
8.0 Pencil Matched
12.0 toolkit Matched
32.0 box Matched
58.0 fork Matched