有没有办法获得'&'或使用字符串contains对多列进行排序的"or"结果?
例如下面的例子,如果流派列包含摇滚乐,人口统计数据包含30-50,城市包含盖恩斯维尔,我会得到与数据框中的字符串匹配的结果。
我所理解的是,正确的结果类型列包含摇滚和/或人口统计数据包含30-50和/或城市包含盖恩斯维尔。我得到的结果与数据框中的字符串相匹配?
例如,这种添加到稀疏结果的方式将在.head(5(中填充5个结果,其中包含次要需求的完全匹配和部分匹配。
流派必须包含str摇滚,演示可能包含str 35-50,或者城市是否包含str Gainesville。
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(4)
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
所需的结果如下
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
1 Mike Stevens pop, rock 18-25,25-35,35-50, 50+ 100-500 Somerville
6 Mary Porter jazz, rock 35-50 100-500, 500-100 Hendersonville
这就是我所能做的
进口熊猫作为pd
要测试的数据帧
df = pd.DataFrame({'name':['Alex Smith','Mike Stevens','Brenda West','Holy Kent','Bob West','Sally May','Mary Porter','John Keys'],
'genre': ['rock','pop, rock','jazz',"dj",'rock','pop','jazz, rock',"dj"],'demo':['18-25,25-35,35-50','18-25,25-35,35-50, 50+','35-50','18-25','18-25,25-35,35-50','18-25,35-50, 50+','35-50','18-25'],
'price':['100-500','100-500','100-500, 500-100','1000+','100-500','100-500', '100-500, 500-100','1000+'],
'city':['Gainesville','Somerville','Hendersonville','Pluto','Gainesville','Somerville','Hendersonville','Pluto']})
df.head(10)
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
1 Mike Stevens pop, rock 18-25,25-35,35-50, 50+ 100-500 Somerville
2 Brenda West jazz 35-50 100-500, 500-100 Hendersonville
3 Holy Kent dj 18-25 1000+ Pluto
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
5 Sally May pop 18-25,35-50, 50+ 100-500 Somerville
6 Mary Porter jazz, rock 35-50 100-500, 500-100 Hendersonville
7 John Keys dj 18-25 1000+ Pluto
这给了我30-50的流派和演示中的关键词摇滚的结果
# This gives me the result of the keyword rock in genre and demo of 30-50
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))]
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
1 Mike Stevens pop, rock 18-25,25-35,35-50, 50+ 100-500 Somerville
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
6 Mary Porter jazz, rock 35-50 100-500, 500-100 Hendersonville
**This gives me the result of the keyword rock in genre and demo of 30-50**
# This limits the results to 3 of the keyword in genre and demo of 30-50
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))].head(3)
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
1 Mike Stevens pop, rock 18-25,25-35,35-50, 50+ 100-500 Somerville
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
这为我提供了按城市排序的演示30-50的岩石结果
# best results for rock with demo 30-50 sorted by city by 3
df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(3)
name genre demo price city
0 Alex Smith rock 18-25,25-35,35-50 100-500 Gainesville
4 Bob West rock 18-25,25-35,35-50 100-500 Gainesville
检查是否使用
df[(df.genre.str.contains('rock')) | (df.demo.str.contains('35-50')) | (df.city.str.contains('Gainesville'))].head(4)