我有一个类似的字符串
s = "CITY_NAME == 'Pune' & GENRE in ['$SPORTS$','$CLASSICAL$']$#$CITY_NAME == 'Pune' & GENRE == 'ROMANCE' & QUANTITY >= 25$#$CITY_NAME == 'Pune' & GENRE in ['$ACTION$','$DRAMA$'] & LANGUAGE == 'Hindi'$#$CITY_NAME == 'Pune' & GENRE in ['$MUSICAL$','$Music$'] & EVENT_NAME == 'Dhoom-3'"
实际上,这个字符串是由"$#$"连接几个条件形成的。所以"CITY_NAME=="浦那"&['$SPORTS$','$CLASSICAL$']中的GENRE"是条件,依此类推。
现在我的要求是过滤掉该字符串中的字段(组合)。我的o/p应该会产生
fields = ['CITY_NAME ', 'GENRE', 'QUANTITY ', 'LANGUAGE', 'EVENT_NAME '] ###Only the field name list
我试过做
s1 = s.split('$#$')
### if i have to go for any one condition, then split by '$#$' gives one list of all condition and i will take 0th index condition
#### then i will split them(the individual condition) at '&'
#### then from that list i will split at '==' or '>=' or 'in' and take 0th index item then i can find one field name
q = s1[0] ###"CITY_NAME == 'Pune' & GENRE in ['$SPORTS$','$CLASSICAL$']"
qq = q.split('&') ###["CITY_NAME == 'Pune' ", " GENRE in ['$SPORTS$','$CLASSICAL$']"]
qqq = qq[0] ###"CITY_NAME == 'Pune' "
qqq.split('==')[0] ###CITY_NAME
考虑到任何一个条件,我都试着分开。但无法将所有内容都放在列表压缩语句中。
此外,我相信还有其他简单的方法,比如使用reg表达式。(但我很喜欢reg表达式。)
需要一些代码帮助。。。谢谢
import re
s = ...
fields = set()
for word in s.split():
if re.match(r'^[A-Z_]*$', word):
fields.add(word)
fields = list(fields)
print fields
这产生:['CITY_NAME', 'GENRE', 'EVENT_NAME', 'LANGUAGE', 'QUANTITY']
它将字段分离出来的原因是,它们是唯一一个除_
之外没有特殊字符的全大写单词。
是的,解析器在这里是有意义的。。以下是我尝试过的(稍后可以包含在函数定义中)
lst = list() ###empty list to store parsed results
for i in s.split('$#$'):
#print(i)
for j in i.split('&'):
#print(j)
word = re.split(r'[(==)(>=)(<=)(in)(like)]', j)[0].strip()
#print(word)
l.append(word)
print (list(set(lst))) ##o/p: ['CITY_NAME ', 'GENRE', 'QUANTITY ', 'LANGUAGE', 'EVENT_NAME ']
我尽量避免一些额外的字符串,因为它们可以用作分隔符(in,like)。
我觉得@Bryce的回答更像蟒蛇。