问题:使用[nn]模式将txt拆分为多个句子。最终结果是10句话。Lookbacking给了我10个句子,但我漏掉了句子中的[nn]。向前看给我9句话,我错过了最后一句。我需要在句子中加入模式[nn]
txt="[01] Final Step - Protonica [02] Liquid Frequencies (Liquid Soul Mix) - Liquid Soul [03] Global Illumination - Liquid Soul [04] Devotion - Liquid Soul [05] Black Rock City - Quantize [06] Plazza Del Trripy - Andromeda [07] Private Guide - Suntree [08] Stereo Gun - Vibrasphere [09] The Cycle - Ritree [10] Atmonizer - Andromed"
我用了一个前瞻来寻找匹配。我漏掉了最后一句话。
print(".+? is the ungreedy character match")
#print("(?<=[d{2}]) is the lookbehind character match")
print("(?=[d{2}]) is the lookforward character match")
#pattern=r"(?<=[d{2}]).+?(?=[d{2}])"
pattern=r".+?(?=[d{2}])"
matches=re.findall(pattern,txt)
for match in matches:
print("output",match)
output:
output [01] Final Step - Protonica
output [02] Liquid Frequencies (Liquid Soul Mix) - Liquid Soul
output [03] Global Illumination - Liquid Soul
output [04] Devotion - Liquid Soul
output [05] Black Rock City - Quantize
output [06] Plazza Del Trripy - Andromeda
output [07] Private Guide - Suntree
output [08] Stereo Gun - Vibrasphere
output [09] The Cycle - Ritree
缺少
output [10] Atmonizer - Andromed
我使用lookahead查找与|$匹配的项,以查找最后一句
print(".+? is the ungreedy character match")
print("(?=[d{2}]) is the lookforward character match")
pattern="[d{2}].+?(?=[d{2}]|$)"
matches=re.findall(pattern,txt)
for match in matches:
print("output",match)
输出:
output [01] Final Step - Protonica
output [02] Liquid Frequencies (Liquid Soul Mix) - Liquid Soul
output [03] Global Illumination - Liquid Soul
output [04] Devotion - Liquid Soul
output [05] Black Rock City - Quantize
output [06] Plazza Del Trripy - Andromeda
output [07] Private Guide - Suntree
output [08] Stereo Gun - Vibrasphere
output [09] The Cycle - Ritree
output [10] Atmonizer - Andromed
通常,您可以避免前瞻,只查找字符,直到而不是下一个
>>> re.findall(r"[[^[]+", txt)
['[01] Final Step - Protonica ', '[02] Liquid Frequencies (Liquid Soul Mix) - Liquid Soul ', '[03] Global Illumination - Liquid Soul ', '[04] Devotion - Liquid Soul ', '[05] Black Rock City - Quantize ', '[06] Plazza Del Trripy - Andromeda ', '[07] Private Guide - Suntree ', '[08] Stereo Gun - Vibrasphere ', '[09] The Cycle - Ritree ', '[10] Atmonizer - Andromed']
这是通过找到[
,然后贪婪地找到任何而不是[
的字符(开始下一个块(来实现的
这种方法适用于您当前的输入,但如果您希望在分组中找到更多的[
字符,则不起作用;在这种情况下,您应该使用-
搜索整个块,或者更正输入,使其为您去除中的这些字符