当只知道开始和结束字符时，收集列表中的子字符串

这个问题的措辞有点尴尬，但我的意思是：我有一个大字符串，叫做text。其中有来自不同用户的消息，我想在字符串中隔离某些内容。它看起来像这样：

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'

我试图隔离的是user1的消息和响应。因此，制作两个列表如下：

messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are

当然，有时其他用户会相互交谈，所以我只想得到user1消息后的第一个响应。我认为可以在这里使用正则表达式，但我很难找到确切的内容和方式
如果需要澄清，请告诉我。谢谢

其实并不需要正则表达式。相反，扫描字符串并使用其排列的先验知识来获得您想要的片段。

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
target = 'user1:'
messages = []
responses = []
lastUser = None
isResponse = False
for v in text.split('n'):
if lastUser == None:
lastUser = v
else:
if lastUser == target:
messages.append(v)
isResponse = True
elif isResponse:
responses.append(v)
isResponse = False
lastUser = None

如果你的格式总是这样(user1和randomuser1234…(，你可以使用下面的代码：

import re
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
messages = []
responses = []
user1_msg = re.compile(r'user1:n(.+)n')
randomusers_msg = re.compile(r'randomuserd*:n(.+)n')
messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))
print(messages)
print(responses)

输出：

['hey how are you']
['im doing good', 'oh hey user1']

希望能有所帮助。

text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
tokens = text.splitlines()
print(tokens)

['user1:'，'你好吗'，'randomuser:'，'我做得很好'，'randomuser2:'，'哦，嘿，user1']

我想你可以做剩下的。

相关内容

最新更新

热门标签：