二进制字符串到列表Python



我有一个二进制字符串,如'1100011101'。我想把它解析成一个列表,其中每个1或0的块是列表中的一个单独的值。

如:'1100011101'变为['11', '000', '111', '0', '1']

您可以通过使用regex而不是groupby()+join()来节省(少量)性能。这只查找10组:

import re
s = '1100011101'
l = re.findall(r"0+|1+", s)
# ['11', '000', '111', '0', '1']

时间:

s = '1100011101' * 1000
%timeit l = [''.join(g) for _, g in groupby(s)]
# 1.16 ms ± 9.79 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit re.findall(r"0+|1+", s)
# 723 µs ± 5.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

使用itertools.groupby:

from itertools import groupby
binary = "1100011101"
result = ["".join(repeat) for _, repeat in groupby(binary)]
print(result)

['11', '000', '111', '0', '1']

使用.replace()在01和10之间插入一个空格,然后分割结果字符串:

'1100011101'.replace("01","0 1").replace("10","1 0").split()
['11', '000', '111', '0', '1']

withgroupby

>>> from itertools import groupby as f
>>> x = str(1100011101)
>>> sol = [''.join(v) for k, v in f(x)]
>>> print(sol)
['11', '000', '111', '0', '1']

不使用groupby,如果你想要更快的执行

def func(string):
if not string:
return []
def get_data(string):
if not string:
return 
count = 0
target = string[0]
for i in string:
if i==target:
count+=1
else:
yield target*count
count = 1
target = i
if count>0:
yield target*count
return list(get_data(string))

x = '1100011101'
sol =func(x)
print(sol)

输出
['11', '000', '111', '0', '1']

我的机器计时

from itertools import groupby
s = '11000111010101' * 100000
%timeit l = [''.join(g) for _, g in groupby(s)]
318 ms ± 2.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

import re
s = '11000111010101' * 100000
%timeit l = re.findall(r"0+|1+", s)
216 ms ± 2.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def func(string):
if not string:
return []
def get_data(string):
if not string:
return 
count = 0
target = string[0]
for i in string:
if i==target:
count+=1
else:
yield target*count
count = 1
target = i
if count>0:
yield target*count
return list(get_data(string))
s = '11000111010101' * 100000
%timeit func(s)
205 ms ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
####################################################################
from itertools import groupby
s = '11000111010101' * 1000
%timeit l = [''.join(g) for _, g in groupby(s)]
3.28 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

import re
s = '11000111010101' * 1000
%timeit l = re.findall(r"0+|1+", s)
2.06 ms ± 57.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

def func(string):
if not string:
return []
def get_data(string):
if not string:
return 
count = 0
target = string[0]
for i in string:
if i==target:
count+=1
else:
yield target*count
count = 1
target = i
if count>0:
yield target*count
return list(get_data(string))
s = '11000111010101' * 1000
%timeit func(s)
1.91 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)