在两个字符串中获取一个子字符串

我有一个非常非常大的字符串，其中包含来自某个系统的日志
我只想要以<status>开始、以</status>结束的部分
我听说RegEx表达式是一个好方法，但我真的不知道如何使用它。
有什么想法吗？

s = "Hello I am a very long string <status>I've got a lovely bunch of coconuts</status> here they are standing in a row"
excerpt = s.partition("<status>")[2].rpartition("</status>")[0]
print excerpt

结果：

I've got a lovely bunch of coconuts

如果你想尝试regex，这里有一种方法：

import re
regex = re.compile(r"<status>(.*?)</status>", re.IGNORECASE)
s = """This is some long random text <status>This is the first status block</status> 
and some more text <status>and another block</status> 
and yet more <status>This is the last status block</status>"""
print(re.findall(regex, s))

产生

['This is the first status block', 'and another block', 'This is the last status block']

演示

这种方法的主要优点是它提取了一行上所有<status>...</status>块，而不仅仅是第一个块。请注意，对于三引号字符串，<status>和</status>都需要在同一行上。

如果<status>和</status>只出现一次，则可以使用string_name[string_name.index("<status>") + 8: string_name.index("</status>"]。

s = "test<status>test2</status>"
print s[s.index("<status>") + 8: s.index("</status>"]

输出：

test2

相关内容

最新更新

热门标签：