熊猫将一个数据帧与另一个具有不同形状的数据帧的比较组合在一起

我正在尝试合并这两个数据帧(DF1 和 DF2(，但仅限于不在第一个数据帧 (DF1( 中的记录。在下面的示例中，我希望结果只选取记录 0,1,4,5 而不是合并 2,3，因为它们在 DF1 中已经具有复杂的单元外观。我尝试合并没有运气和 np.where

np.where(df1[['complex','unit']] != df2[['complex','unit']])这导致了ValueError: Can only compare identically-labeled DataFrame objects

DF1

company complex unit location   datetime            serial     seq  interval
3        6       10  UpMaster     2017-07-21 00:33:37  1505.0  3400.0      1554
4        6       11  UpMaster     2017-07-21 00:59:44  1505.0  3401.0      1567
5        6       10  UpMaster     2017-07-21 01:25:41  1505.0  3402.0      1557
6        6       A   UpMaster     2017-07-21 01:51:45  1505.0  3403.0      1564
7        6       13  UpMaster     2017-07-21 02:17:48  1505.0  3404.0      1563

DF2

index   complex   unit
0        7         1807
1        4         7
2        6         10
3        6         A
4       10         110A
5        6         12

期望的结果

company complex unit location   datetime            serial     seq    interval 
3        6       10  UpMaster     2017-07-21 00:33:37  1505.0  3400.0      1554
4        6       11  UpMaster     2017-07-21 00:59:44  1505.0  3401.0      1567
5        6       10  Down         2017-07-21 01:25:41  1505.0  3402.0      1557
6        6       A   UpMaster     2017-07-21 01:51:45  1505.0  3403.0      1564
7        6       13  UpMaster     2017-07-21 02:17:48  1505.0  3404.0      1563
8        7       1807  NaN       NaN                   NaN     NaN         Nan
9        4       7     NaN       NaN                   NaN     NaN         Nan
10       10      110A  NaN       NaN                   NaN     NaN         Nan
11       6       12    NaN       NaN                   NaN     NaN         Nan

编辑：：追加方法效果很好，谢谢！

df1 = df1.append(df2[-df2['unit_id'].isin(df1['unit_id'].unique())], ignore_index=True)

以上是我在添加唯一标识符后采用unit_id最终解决方案。如果没有这个，建议一个聪明的解决方案从 2 个半唯一字段中制作密钥。

df1['key'] = df1['complex'].astype(str) + ' ' + df1['unit'].astype(str)
df2['key'] = df2['complex'].astype(str) + ' ' + df2['unit'].astype(str)
df1 = df1.append(df2[-df2['key'].isin(df1['key'].unique())],ignore_index=True)
df1 = df1.drop('key',axis=1)

更新的答案

您仍然可以将追加与条件一起使用，但只需要创建一个额外的key列：

df1['key'] = df1['complex'].astype(str) + ' ' + df1['unit'].astype(str)
df2['key'] = df2['complex'].astype(str) + ' ' + df2['unit'].astype(str)
df1 = df1.append(df2[-df2['key'].isin(df1['key'].unique())],ignore_index=True)
df1 = df1.drop('key',axis=1)

以前的答案

我认为您可以使用带有条件的append来做您想做的事情：

df1 = df1.append(df2[-df2['complex'].isin(df1['complex'].unique())],ignore_index=True)

这将留下额外的列company, location, datetime等与 np。NaN 值。您可以稍后使用所需的结果填充company列

相关内容

最新更新

热门标签：

熊猫 将一个数据帧与另一个具有不同形状的数据帧的比较组合在一起

相关内容

最新更新

热门标签：

熊猫将一个数据帧与另一个具有不同形状的数据帧的比较组合在一起