这个问题的措辞有点尴尬,但我的意思是:我有一个大字符串,叫做text
。其中有来自不同用户的消息,我想在字符串中隔离某些内容。它看起来像这样:
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
我试图隔离的是user1的消息和响应。因此,制作两个列表如下:
messages = ['hey how are you']
responses = ['im doing good'] # and so on, as many messages and responses there are
当然,有时其他用户会相互交谈,所以我只想得到user1消息后的第一个响应。我认为可以在这里使用正则表达式,但我很难找到确切的内容和方式
如果需要澄清,请告诉我。谢谢
其实并不需要正则表达式。相反,扫描字符串并使用其排列的先验知识来获得您想要的片段。
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
target = 'user1:'
messages = []
responses = []
lastUser = None
isResponse = False
for v in text.split('n'):
if lastUser == None:
lastUser = v
else:
if lastUser == target:
messages.append(v)
isResponse = True
elif isResponse:
responses.append(v)
isResponse = False
lastUser = None
如果你的格式总是这样(user1和randomuser1234…(,你可以使用下面的代码:
import re
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
messages = []
responses = []
user1_msg = re.compile(r'user1:n(.+)n')
randomusers_msg = re.compile(r'randomuserd*:n(.+)n')
messages.extend(user1_msg.findall(text))
responses.extend(randomusers_msg.findall(text))
print(messages)
print(responses)
输出:
['hey how are you']
['im doing good', 'oh hey user1']
希望能有所帮助。
text = 'user1:nhey how are younrandomuser:nim doing goodnrandomuser2:noh hey user1n'
tokens = text.splitlines()
print(tokens)
['user1:','你好吗','randomuser:','我做得很好','randomuser2:','哦,嘿,user1']
我想你可以做剩下的。