在合并数据帧时，是否可以只使用特定次数的行

我有这样的数据：

dfA:    dfB:
type    type | name | n
------  ----------------
A       A | 123  | 1
B       B | 123  | 1
A       A | 456  | 1
B       B | 789  | 1
A

列n给出了dfB的元素可以被添加到dfA的次数。

是否可以"；合并"；(或者使用其他Panda函数)type上的dfB到dfA，这样我的结果就不会包括来自dfB的命名行超过n次？dfA的序列应该用于确定哪一行排在第一位。所以在这种情况下：

desired result:
type | name
----------------
A | 123 
B | 789 ------> the second row "123" does not get added since it is already n=1 times
A | 456         in the resulting data. The row with name="789" is added instead.
B | NO MATCH -> There are no more rows fitting the Criteria "type = B"
A | NO MATCH -> There are no more rows fitting the Criterua "type = A"

编辑：dfA中的列type与dfB中的列不同，因此不可能提前删除dfB中的数据。考虑dfA的这个变体(dfB保持不变)：

dfA:     dfB:                result:            
type     type | name | n     type | name        
-----    ----------------    -----------        
B        A | 123  | 1        B | 123            
A        B | 123  | 1        A | 456            
B        A | 456  | 1        B | 789           
A        B | 789  | 1        A | NO MATCH      
B                            B | NO MATCH

您想要什么还不完全清楚，但假设您希望name出现的次数不超过n次，则可以执行以下操作：

dfB.assign(name=dfB['name'].where(dfB.groupby('name').cumcount().lt(dfB['n'])))[['type', 'name']]

输出：

type   name
0    A  123.0
1    B    NaN
2    A  456.0
3    B  789.0

您期望的合并操作也不清楚，但一旦您有了上述数据帧，就可以根据您的要求join或merge。

相关内容

最新更新

热门标签：