当只知道开始和结束字符时,收集列表中的子字符串



这个问题的措辞有点尴尬,但我的意思是:我有一个大字符串,叫做text。其中有来自不同用户的消息,我想在字符串中隔离某些内容。它看起来像这样:

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'

我试图隔离的是user1的消息和响应。因此,制作两个列表如下:

messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are

当然,有时其他用户会相互交谈,所以我只想得到user1消息后的第一个响应。我认为可以在这里使用正则表达式,但我很难找到确切的内容和方式
如果需要澄清,请告诉我。谢谢

其实并不需要正则表达式。相反,扫描字符串并使用其排列的先验知识来获得您想要的片段。

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
target = 'user1:'
messages = []
responses = []
lastUser = None
isResponse = False
for v in text.split('n'):
if lastUser == None:
lastUser = v
else:
if lastUser == target:
messages.append(v)
isResponse = True
elif isResponse:
responses.append(v)
isResponse = False
lastUser = None

如果你的格式总是这样(user1和randomuser1234…(,你可以使用下面的代码:

import re
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
messages = []
responses = []
user1_msg = re.compile(r'user1:n(.+)n')
randomusers_msg = re.compile(r'randomuserd*:n(.+)n')
messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))
print(messages)
print(responses)

输出:

['hey how are you']
['im doing good', 'oh hey user1']

希望能有所帮助。

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
tokens = text.splitlines()
print(tokens)

['user1:','你好吗','randomuser:','我做得很好','randomuser2:','哦,嘿,user1']

我想你可以做剩下的。

最新更新