如何使用元组列表筛选数据帧



我有一个像这样的元组列表(PRODUCT_ID,COUNTRY_CODE(:

[(1111, 'CO'),
(2222, 'CO'),
(1111, 'BR')]

和像这样的数据帧:

df = pd.DataFrame({
'COUNTRY_CODE': ['CO','CO','CO','BR','BR','BR','CO'], 
'VERTICAL_GROUP_ID': [2,2,3,2,3,3,3], 
'SUB_VERTICAL': ['SUPER','SUPER','HOME','LICOR','SPORTS','HOME','TECH'], 
'PRODUCT_ID': [1111,3333,1111,4444,1111,2222,2222], 
'SHOWN': [7,8,12,14,16,1,13], 
})

我如何过滤数据帧,以便得到这样的结果数据帧,只使用PRODUCT_ID和COUNTRY_CODE元组列表中的值进行过滤?

>2//tr>111/table>
COUNTRY_CODEVERTICAL_GROUP_IDSUB_VERTICAL
CO2超级
CO3HOME1111
BR3SPORTS
CO3TECH

让我们使用Multiindex.isin

df[df.set_index(['PRODUCT_ID', 'COUNTRY_CODE']).index.isin(tup)]

或者,您可以从元组创建一个新的数据帧,然后与给定的数据帧进行内部合并

filters = pd.DataFrame(tup, columns=['PRODUCT_ID', 'COUNTRY_CODE'])
df.merge(filters)

COUNTRY_CODE  VERTICAL_GROUP_ID SUB_VERTICAL  PRODUCT_ID  SHOWN
0           CO                  2        SUPER        1111      7
1           CO                  3         HOME        1111     12
2           BR                  3       SPORTS        1111     16
3           CO                  3         TECH        2222     13

您可以zip两列:PRODUCT_IDCOUNTRY_CODE,检查值是否在tuplelist中,并返回所需的df

lst = [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
m = [tpl in lst for tpl in zip(df['PRODUCT_ID'], df['COUNTRY_CODE'])]
# m -> [True, False, True, False, True, False, True]
df_new = df[m]
print(df_new)

输出:

COUNTRY_CODE  VERTICAL_GROUP_ID SUB_VERTICAL  PRODUCT_ID  SHOWN
0           CO                  2        SUPER        1111      7
2           CO                  3         HOME        1111     12
4           BR                  3       SPORTS        1111     16
6           CO                  3         TECH        2222     13

说明:

>>> tuple(zip(df['PRODUCT_ID'], df['COUNTRY_CODE']))
((1111, 'CO'), # True  : in [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
(3333, 'CO'), # False : Not in [(1111, 'CO'),(2222, 'CO'),(1111, 'BR')]
(1111, 'CO'),
(4444, 'BR'),
(1111, 'BR'),
(2222, 'BR'),
(2222, 'CO'))

最新更新