如何使用多个分隔符拆分字符串,但每个分隔符只能拆分一次?Python



我正试图用下面的所有分隔符拆分一个字符串,但只有一次。

string = 'it; seems; liketa goodtday to watchvavmovie.'

delimiters = 't v ;'

在这种情况下,输出将是:

['it', ' seems; like', 'a goodtday to watch', 'avmovie.']

显然,上面的例子是一个无稽之谈的例子,但我正在努力了解这是否可能。一个相当复杂的正则表达式是否合适?

如果之前有人问过这个问题,我深表歉意。我做了不少搜索,找不到像我的例子那样的东西。谢谢你抽出时间!

这应该可以做到:

import re
def split_once_by(s, delims):
delims = set(delims)
parts = []
while delims:
delim_re = '({})'.format('|'.join(re.escape(d) for d in delims))
result = re.split(delim_re, s, maxsplit=1)
if len(result) == 3:
first, delim, s = result
parts.append(first)
delims.remove(delim)
else:
break

parts.append(s)
return parts

示例:

>>> split_once_by('it; seems; liketa goodtday to watchvavmovie.', 'tv;')
['it', ' seems; like', 'a goodtday to watch', 'ax0bmovie.']

Burning Alcohol的回答激发了我写下这个(IMO(更好的功能:

def split_once_by(s, delims):
split_points = sorted((s.find(d), -len(d), d) for d in delims)
start = 0
for stop, _longest_first, d in split_points:
if stop < start: continue
yield s[start:stop]
start = stop + len(d)
yield s[start:]

使用:

>>> list(split_once_by('it; seems; liketa goodtday to watchvavmovie.', 'tv;'))
['it', ' seems; like', 'a goodtday to watch', 'ax0bmovie.']

一个简单的算法就可以了,

test_string = r'it; seems; liketa goodtday to watchvavmovie.'
delimiters = [r't', r'v', ';']
# find the index of each first occurence and sort it
delimiters = sorted(delimiters, key=lambda delimiter: test_string.find(delimiter))
splitted_string = [test_string]
# perform split with option maxsplit
for index, delimiter in enumerate(delimiters):
if delimiter in splitted_string[-1]:
splitted_string += splitted_string[-1].split(delimiter, maxsplit=1)
splitted_string.pop(index)
print(splitted_string)
# ['it', ' seems; like', 'a good\tday to watch', 'a\vmovie.']

只需创建一个模式列表并应用一次:

string = 'it; seems; liketa goodtday to watchvavmovie.'
patterns = ['t', 'v', ';']
for pattern in patterns:
string = '*****'.join(string.split(pattern, maxsplit=1)) 
print(string.split('*****'))

输出:

['it', ' seems; like', 'a goodtday to watch', 'ax0bmovie.']

那么,什么是"*****"??

在每次迭代中,当您应用split方法时,您会得到一个列表。因此,在下一次迭代中,您不能应用.split ()方法(因为您有一个列表(,所以您必须将该列表的每个值与一些奇怪的字符连接起来,如"****""@@@""^^^^^^^"或任何您想要的字符,以便在下一个迭代中重新应用split((。最后,对于字符串上的每个"*****",您将有一个列表模式,因此您可以使用它进行最终拆分。

最新更新