给定一个 8 位字节,以字节的二进制表示形式生成"ones"的序列和计数(使用最小化查找表)



我想使用查找表来解决问题。 构建表后,我希望能够执行以下操作:

# lookup "0110 1001"
byte = 0b01101001
offset = offsets[byte]
count = counts[byte]
sequence = table[offset:offset+count]
print "Sequence for {:08b}: {:d} one(s) at positions".format(byte, count), sequence
>> Sequence for 01101001: 4 one(s) at positions [0, 3, 5, 6]

我目前的努力很简单,并产生了正确的结果,但table并没有尽可能节省空间。 下面的代码生成包含 1024 项的正确table

table = []
offsets = []
counts = []
current_offset = 0
for byte in range(256):
b = [j for j in range(8) if byte & (1 << j)]
table += b
offsets.append(current_offset)
current_offset += len(b)
counts.append(len(b))

我知道存在一个更小table,因为...

列表中的子列表[0, 1, 2, 3, 4, 5, 6, 7]涵盖byte的许多可能值。

  • 0b0000 0001:偏移量=0,计数=1,序列=[0]
  • 0b0000 0011: 偏移量=0, 计数=2, 序列 = [0, 1]
  • 0b0000 0111: 偏移量=0, 计数=3, 序列 = [0, 1, 2] ...
  • 0b1111 1111: 偏移量=0, 计数=8, 序列 = [0, 1, 2, 3, 4, 5, 6, 7]
  • 0b0000 0010: 偏移量=1, 计数=1, 序列 = [1]
  • 0b0000 0110: 偏移量=1, 计数=2, 序列 = [1, 2] ...

我正在寻找一种构建尽可能小table的方法(就元素数量而言)。我还应该说,tableoffsetscounts都需要是列表或序列(没有花哨的Python对象),因为我的最终目标是将表硬编码到微控制器中。

编辑:验证来自用户2357112的答案。table减少到320个元素!

# given an 8 bit byte in binary, return a list and count of ones.
table = []
offsets = []
counts = []
for byte in range(64):
table += [0] + [j+1 for j in range(6) if byte & (1 << j)]+ [7]
for byte in range(256):
b = [j for j in range(8) if byte & (1 << j)]
counts.append(len(b))
for i in xrange(len(table)):
# should be len(table)-len(b), but I want to throw an 
#IndexError exception if the pattern isn't found
try:
if table[i:i+len(b)] == b:
offsets.append(i)
break
except IndexError:
print b, " not found in table."
except:
rasie
# verify
good_count = 0
bad_count = 0
for byte in range(256):
offset = offsets[byte]
count = counts[byte]
sequence = table[offset:offset+count]
check = 0
for bit in sequence:
check = check | (1 << bit)
if byte != check:
print "Error: {:08b} != {:08b}".format(byte, check), sequence
bad_count += 1
else:
good_count += 1
print "Good: {:d} Bad: {:d}".format(good_count, bad_count)
>> Good: 256 Bad: 0

如果位顺序很重要,这实际上比最初听起来要容易得多,因为您可以将所有 8 位数字的序列与 MSB 和 LSB 集连接起来(因此所有序列都以 0 开头并以 7 结尾)。

要看到这是最佳的,首先,请注意,无法重叠从 0 开始并以 7 结束的两个递增序列。因此,所有这些序列必须在我们的表中单独表示。

然后,请注意,这些序列涵盖了我们需要表示的所有其他序列,因为任何不以 0 开头或不以 7 结尾的递增序列都是以 0 开头并以 7 结尾的序列的子序列。


如果[3, 4, 1]0b00011010[1, 3, 4]一样好,那么您可以进一步压缩内容,但这样做很棘手。我没有一个简单的方法来找到最佳解决方案,但我确实有光束搜索。这是长度-81表光束搜索给出的:

[0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 4, 2, 5, 7, 3, 0, 1, 6, 2, 7, 4, 0, 5, 3, 1, 6, 4
, 0, 2, 5, 3, 6, 7, 1, 4, 3, 0, 2, 6, 7, 5, 1, 4, 6, 2, 3, 7, 1, 5, 0, 6, 4, 3,
7, 0, 2, 1, 5, 3, 4, 7, 2, 1, 5, 6, 3, 0, 7, 4, 6, 2, 5, 0, 7, 4, 5, 0, 1, 2, 6,
3]

这是我写的快速而肮脏的光束搜索,发现:

# increase this to consume more memory in hopes of a better solution
# particularly low values may never find a solution
beam_width = 10000
# initialize the sequence to prevent wasting beam width on symmetric options
# I have no concrete proof that starting the sequence this way is optimal,
# but it's intuitively reasonable and it improved the solution quality.
initialseq = [0, 1, 2, 3, 4, 5, 6, 7]
initialset = set()
for i in range(9):
for j in range(i, 9):
initialset.add(frozenset(initialseq[i:j]))
seqs = [initialseq]
sets = [initialset]
while max(map(len, sets)) < 256:
newseqs = []
newsets = []
for i, (seq, set_) in enumerate(zip(seqs, sets)):
for newelem in range(8):
newseq = seq + [newelem]
newset = set_.copy()
for slicestart in range(-8, 0):
slice_ = newseq[slicestart:]
if len(slice_) == len(set(slice_)):
newset.add(frozenset(slice_))
newseqs.append(newseq)
newsets.append(newset)
seqs_and_sets = zip(newseqs, newsets)
seqs_and_sets.sort(key=lambda x: len(x[1]), reverse=True)
seqs_and_sets = seqs_and_sets[:beam_width]
seqs, sets = zip(*seqs_and_sets)

最新更新