如何在引号之间存储字符串



我编写了搜索代码,我想存储在";作为列表中的一个位置,我该怎么做呢?

我使用正则表达式。该列表将包含圆括号()之外的所有内容,并将它们存储在名为must的列表中。

import re
message = 'all all "exact exact" (any OR "anyone") -none -"none"'
others = ' '.join(re.split('(.*)', message))
others_split = others.split()
to_compile = re.compile('.*((.*)).*')
to_match = to_compile.match(message)
ors_string = to_match.group(1)
must = list(filter(lambda word: not word.startswith('-'), others_split))
print(f'must: {must}')

输出:

must: ['all', 'all', '"exact', 'exact"']

希望结果:

must: ['all', 'all', '"exact exact"']

要正确解析带有状态的搜索字符串(例如否定,括号),您还需要一个真正的有状态解析器,例如这里的re.Scanner:

import re
scanner = re.Scanner(
[
(r'"', lambda scanner, token: ("QUOTE", token)),
(r"(", lambda scanner, token: ("OPEN_PAR", token)),
(r")", lambda scanner, token: ("CLOSE_PAR", token)),
(r"-", lambda scanner, token: ("NOT", token)),
(r"s+", lambda scanner, token: ("WS", token)),
(r"w+", lambda scanner, token: ("TEXT", token)),
]
)

def parse_search(search):
def emit_current():
nonlocal par_depth, is_negated, curr_phrase
yield (par_depth, is_negated, curr_phrase)
curr_phrase = ""
is_negated = False
result, rest = scanner.scan(search)
is_negated = False
in_quotes = False
curr_phrase = ""
par_depth = 0
for type, value in result:
if in_quotes and type != "QUOTE":
curr_phrase += value
continue
if type == "OPEN_PAR":
par_depth += 1
continue
if type == "CLOSE_PAR":
if par_depth == 0:
raise ValueError("Unbalanced parentheses")
par_depth -= 1
continue
if type == "QUOTE":
curr_phrase += value  # keep quote in phrase
in_quotes = not in_quotes
if not in_quotes:
yield from emit_current()
continue
if type == "NOT":
is_negated = True
continue
if type == "TEXT":
curr_phrase += value
yield from emit_current()
continue
if in_quotes:
raise ValueError("Unbalanced quotes")

def main():
message = 'all "((( oh no )))" alley "exact text" (any OR "anyone") -no_no_no -"none"'
must = []
must_not = []
for par_depth, negated, phrase in parse_search(message):
if par_depth > 0:
# This implementation ignores all (nested) parenthesized segments
continue
(must_not if negated else must).append(phrase)
print(f"{must=}", f"{must_not=}", sep="n")

if __name__ == "__main__":
main()

输出为

must=['all', '"((( oh no )))"', 'alley', '"exact text"']
must_not=['no_no_no', '"none"']

代替分割,一个更简单的方法是使用re.findall与一个交替模式,匹配一个括号括起来的字符串,一个引号字符串或一个单词,但只捕获后两个:

[
term
for term in re.findall(r'(.*?)|(-?(?:".*?"|w+))', message)
if term and not term.startswith('-')
]

演示:https://replit.com/@blhsing/CylindricalTalkativeNetbsd

最新更新