在Dataframe中使用str.contains搜索多个条件,使用And-Or填充稀疏结果



有没有办法获得'&'或使用字符串contains对多列进行排序的"or"结果?

例如下面的例子,如果流派列包含摇滚乐,人口统计数据包含30-50,城市包含盖恩斯维尔,我会得到与数据框中的字符串匹配的结果。

我所理解的是,正确的结果类型列包含摇滚和/或人口统计数据包含30-50和/或城市包含盖恩斯维尔。我得到的结果与数据框中的字符串相匹配?

例如,这种添加到稀疏结果的方式将在.head(5(中填充5个结果,其中包含次要需求的完全匹配和部分匹配。

流派必须包含str摇滚,演示可能包含str 35-50,或者城市是否包含str Gainesville。

df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(4)
name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

所需的结果如下

name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville

这就是我所能做的

进口熊猫作为pd

要测试的数据帧

df = pd.DataFrame({'name':['Alex Smith','Mike Stevens','Brenda West','Holy Kent','Bob West','Sally May','Mary Porter','John Keys'], 
'genre': ['rock','pop, rock','jazz',"dj",'rock','pop','jazz, rock',"dj"],'demo':['18-25,25-35,35-50','18-25,25-35,35-50, 50+','35-50','18-25','18-25,25-35,35-50','18-25,35-50, 50+','35-50','18-25'],
'price':['100-500','100-500','100-500, 500-100','1000+','100-500','100-500', '100-500, 500-100','1000+'],
'city':['Gainesville','Somerville','Hendersonville','Pluto','Gainesville','Somerville','Hendersonville','Pluto']})
df.head(10)
name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
2   Brenda West     jazz    35-50   100-500, 500-100    Hendersonville
3   Holy Kent   dj  18-25   1000+   Pluto
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
5   Sally May   pop     18-25,35-50, 50+    100-500     Somerville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville
7   John Keys   dj  18-25   1000+   Pluto

这给了我30-50的流派和演示中的关键词摇滚的结果

# This gives me the result of the keyword rock in genre and demo of 30-50
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))]
name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville
**This gives me the result of the keyword rock in genre and demo of 30-50**
# This limits the results to 3 of the keyword in genre and demo of 30-50
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))].head(3)

name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

这为我提供了按城市排序的演示30-50的岩石结果

# best results for rock with demo 30-50 sorted by city by 3 
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(3)

name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

检查是否使用

df[(df.genre.str.contains('rock')) | (df.demo.str.contains('35-50')) | (df.city.str.contains('Gainesville'))].head(4)

最新更新