python字符串通过分隔符拆分所有可能的排列

这可能与Python 3.3类似的问题密切相关：拆分字符串并创建所有组合，但我无法从中推断出pythonic解决方案。

问题是：

假设有一个像'hi|guys|whats|app'这样的 str，我需要通过分隔符拆分该 str 的所有排列。例：

#splitting only once
['hi','guys|whats|app']
['hi|guys','whats|app']
['hi|guys|whats','app']
#splitting only twice
['hi','guys','whats|app']
['hi','guys|whats','app']
#splitting only three times
...
etc

我可以编写一个回溯算法，但是python(例如itertools)是否提供了一个简化该算法的库？

提前感谢！！

一种方法是，一旦你分割了字符串，就是使用itertools.combinations来定义列表中的分割点，其他位置应该再次融合。

def lst_merge(lst, positions, sep='|'):
'''merges a list on points other than positions'''
'''A, B, C, D and 0, 1 -> A, B, C|D'''
a = -1
out = []
for b in list(positions)+[len(lst)-1]:
out.append('|'.join(lst[a+1:b+1]))
a = b
return out
def split_comb(s, split=1, sep='|'):
from itertools import combinations
l = s.split(sep)
return [lst_merge(l, pos, sep=sep)
for pos in combinations(range(len(l)-1), split)]

例子

>>> split_comb('hi|guys|whats|app', 0)
[['hi|guys|whats|app']]
>>> split_comb('hi|guys|whats|app', 1)
[['hi', 'guys|whats|app'],
['hi|guys', 'whats|app'],
['hi|guys|whats', 'app']]
>>> split_comb('hi|guys|whats|app', 2)
[['hi', 'guys', 'whats|app'],
['hi', 'guys|whats', 'app'],
['hi|guys', 'whats', 'app']]
>>> split_comb('hi|guys|whats|app', 3)
[['hi', 'guys', 'whats', 'app']]
>>> split_comb('hi|guys|whats|app', 4)
[] ## impossible

理由

ABCD -> A B C D
0 1 2
combinations of split points: 0/1 or 0/2 or 1/2
0/1 -> merge on 2 -> A B CD
0/2 -> merge on 1 -> A BC D
1/2 -> merge on 0 -> AB C D

泛型函数

这是一个通用版本，工作原理与上面类似，但也将-1作为split的参数，在这种情况下，它将输出所有组合

def lst_merge(lst, positions, sep='|'):
a = -1
out = []
for b in list(positions)+[len(lst)-1]:
out.append('|'.join(lst[a+1:b+1]))
a = b
return out
def split_comb(s, split=1, sep='|'):
from itertools import combinations, chain

l = s.split(sep)

if split == -1:
pos = chain.from_iterable(combinations(range(len(l)-1), r)
for r in range(len(l)+1))
else:
pos = combinations(range(len(l)-1), split)

return [lst_merge(l, pos, sep=sep)
for pos in pos]

例：

>>> split_comb('hi|guys|whats|app', -1)
[['hi|guys|whats|app'],
['hi', 'guys|whats|app'],
['hi|guys', 'whats|app'],
['hi|guys|whats', 'app'],
['hi', 'guys', 'whats|app'],
['hi', 'guys|whats', 'app'],
['hi|guys', 'whats', 'app'],
['hi', 'guys', 'whats', 'app']]

这是我想出的一个递归函数：

def splitperms(string, i=0):
if len(string) == i:
return [[string]]
elif string[i] == "|":
return [*[[string[:i]] + split for split in splitperms(string[i + 1:])], *splitperms(string, i + 1)]
else:
return splitperms(string, i + 1)

输出：

>>> splitperms('hi|guys|whats|app')
[['hi', 'guys', 'whats', 'app'], ['hi', 'guys', 'whats|app'], ['hi', 'guys|whats', 'app'], ['hi', 'guys|whats|app'], ['hi|guys', 'whats', 'app'], ['hi|guys', 'whats|app'], ['hi|guys|whats', 'app'], ['hi|guys|whats|app']]
>>>

一种使用combinations和chain的方法

from itertools import combinations, chain

def partition(alist, indices):
# https://stackoverflow.com/a/1198876/4001592
pairs = zip(chain([0], indices), chain(indices, [None]))
return (alist[i:j] for i, j in pairs)

s = 'hi|guys|whats|app'
delimiter_count = s.count("|")
splits = s.split("|")

for i in range(1, delimiter_count + 1):
print("split", i)
for combination in combinations(range(1, delimiter_count + 1), i):
res = ["|".join(part) for part in partition(splits, combination)]
print(res)

输出

split 1
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
split 2
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
split 3
['hi', 'guys', 'whats', 'app']

这个想法是生成所有方法来选择(或删除)分隔符 1、2、3 次并从那里生成分区。

如果你想要所有分区，请尝试从更多迭代工具partitions：

from more_itertools import partitions
s = 'hi|guys|whats|app'
for p in partitions(s.split('|')):
print(list(map('|'.join, p)))

输出：

['hi|guys|whats|app']
['hi', 'guys|whats|app']
['hi|guys', 'whats|app']
['hi|guys|whats', 'app']
['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']
['hi', 'guys', 'whats', 'app']

如果您只需要一定数量的拆分，那么与其在所有分隔符上拆分然后重新连接部分，不如获取分隔符索引的组合并相应地获取子字符串：

from itertools import combinations
s = 'hi|guys|whats|app'
splits = 2
indexes = [i for i, c in enumerate(s) if c == '|']
for I in combinations(indexes, splits):
print([s[i+1:j] for i, j in zip([-1, *I], [*I, None])])

输出：

['hi', 'guys', 'whats|app']
['hi', 'guys|whats', 'app']
['hi|guys', 'whats', 'app']

我很惊讶大多数答案都使用combinations，这显然是一个二进制幂序列(即多个二元笛卡尔乘积连接起来)。

让我详细说明一下：如果我们有n分隔符，我们就2**n可能的字符串，其中每个分隔符要么是on的，要么是off的。因此，如果我们将整数序列的每个位从0映射到2**n到每个分隔符(0意味着我们不拆分，1意味着我们拆分)，我们可以非常有效地生成整个事情(不会遇到堆栈深度限制，并且能够暂停和恢复生成器 - 甚至并行运行它！- 仅使用一个简单的整数来跟踪进度)。

def partition(index, tokens, separator):
def helper():
n = index
for token in tokens:
yield token
if n % 2:
yield separator
n //= 2
return ''.join(helper())
def all_partitions(txt, separator):
tokens = txt.split(separator)
for i in range(2**(len(tokens)-1)):
yield partition(i, tokens, separator)
for x in all_partitions('hi|guys|whats|app', '|'):
print(x)

解释：

hi|guys|whats|app
^    ^     ^
bit  0    1     2 (big endian representation)

hi guys whats up
^    ^     ^
0 =  0    0     0

hi|guys whats up
^    ^     ^
1 =  1    0     0

hi guys|whats up
^    ^     ^
2 =  0    1     0

hi|guys|whats up
^    ^     ^
3 =  1    1     0

hi guys whats|up
^    ^     ^
4 =  0    0     1

hi|guys whats|up
^    ^     ^
5 =  1    0     1

hi guys|whats|up
^    ^     ^
6 =  0    1     1

hi|guys|whats|up
^    ^     ^
7 =  1    1     1

例子

理由

泛型函数

相关内容

最新更新

热门标签：