比较主数据帧和子数据帧,并仅基于两列值提取新行



我有两个数据帧:

Master_DF:

Symbol,Strike_Price,C_BidPrice,Pecentage,Margin_Req,Underlay,C_LTP,LotSize
JETAIRWAYS,110.0,1.25,26.0,105308.9,81.05,1.2,2200
JETAIRWAYS,120.0,1.0,32.0,96156.9,81.05,1.15,2200
PCJEWELLER,77.5,0.95,27.0,171217.0,56.95,1.3,6500
PCJEWELLER,80.0,0.8,29.0,161207.0,56.95,0.95,6500
PCJEWELLER,82.5,0.55,31.0,154772.0,56.95,0.95,6500
PCJEWELLER,85.0,0.6,33.0,147882.0,56.95,0.7,6500
PCJEWELLER,90.0,0.5,37.0,138977.0,56.95,0.55,6500

和Child_DF:

Symbol,Strike_Price,C_BidPrice,Pecentage,Margin_Req,Underlay,C_LTP,LotSize
JETAIRWAYS,110.0,1.25,26.0,105308.9,81.05,1.2,2200
JETAIRWAYS,150.0,1.3,22.0,44156.9,81.05,1.05,2200
PCJEWELLER,77.5,0.95,27.0,171217.0,56.95,1.3,6500
PCJEWELLER,100.0,1.8,29.0,441207.0,46.95,4.95,6500

我想将child_DF与基于列(符号,Strike_Price(master_DF进行比较,即如果符号和Strike_Price已经在master_DF中可用,那么它将不会被视为新数据。

新行包括:

Symbol,Strike_Price,C_BidPrice,Pecentage,Margin_Req,Underlay,C_LTP,LotSize
JETAIRWAYS,150.0,1.3,22.0,44156.9,81.05,1.05,2200
PCJEWELLER,100.0,1.8,29.0,441207.0,46.95,4.95,6500

您可以使用右mergeindicator=True一起使用,然后query"right_only",最后reindex()按子级顺序获取列:

(master.merge(child,on=['Symbol','Strike_Price'],how='right',
          suffixes=('_',''),indicator=True)
    .query('_merge=="right_only"')).reindex(child.columns,axis=1)

       Symbol  Strike_Price  C_BidPrice  Pecentage  Margin_Req  Underlay  
2  JETAIRWAYS         150.0         1.3       22.0     44156.9     81.05   
3  PCJEWELLER         100.0         1.8       29.0    441207.0     46.95   
   C_LTP  LotSize  
2   1.05     2200  
3   4.95     6500  
  1. 首先合并符号上的数据帧,strike_price设置指示器=真和如何='正确'

result = pd.merge(master_df[['Symbol','Strike_Price']],child_df,on=['Symbol','Strike_Price'],indicator=True,how='right')

  1. 然后从_merge列中过滤right_only以获得所需的结果

    result = result[result['_merge']=='right_only']

    代码片段

最新更新