如何最有效地区分"properly"解码字符串和解码为十六进制值的字符串？

在网络安全中，我们经常使用strings二进制文件来提取内存转储中的任何明文字符串数据。我正在尝试用Python做类似的操作。

from struct import unpack
find_str = "King"
strings = []
for stream in data.streams:
if type(stream.data) == bytes:

# Not a particular readable solution, but orders of magnitude faster than
# alternatives: https://stackoverflow.com/a/57543519/9400421
unpacked = list(unpack(f'{len(stream.data)}c', stream.data))
string = ''
null = b'x00'
for byte in unpacked:
try:
# ultimately need to track multiple strings arrays for each 
# popular encoding scheme to catch diverse string encodings.
decoded = byte.decode('ASCII')
print(byte, '=>', decoded)
if byte == null:
print(byte, '=>', 'null')
if string != '':
strings.append(string)
string = ''
else:
string += decoded
except:
print("couldn't decode:", byte)
if string != '':
strings.append(string)
string = ''
print(strings)

输出：。。。, '*', 'x7f', 'x10', 'x10', 'x04', 'x01', 'x12+', 'x7f', '*', 'x7f', '@', 'x10', 'x02', 'x01', 'x10x13+', 'x7f', 'x0c', 'x01',。。。

我的问题是，这会输出很多解码值，这些值显然不是正常字符——它们被解码为十六进制字符串。

我的第一个问题是：为什么这些十六进制字符串没有解码为正常字符，却没有触发我的catch语句？我认为任何事情都不是"；干净"；通过所使用的解码方法解码为字符将被我的代码过滤掉。

我的第二个问题是：我如何才能放弃"；垃圾"；字符/从"；干净"；解码字符？

解决方案可以归结为将字节解码为字符串，并只保留可打印的字符。

>>> data = b"A x04 test x12 stringx00x00x00."
>>> ''.join([x for x in data.decode('ascii') if x.isprintable()])
'A  test  string.'

看起来你的代码可以简化为：

stream_strings = []
for stream in data.streams:
if type(stream.data) == bytes:
result = ''.join([x for x in stream.data.decode('ascii') if x.isprintable()])
stream_strings.append(result)
print(stream_strings)

相关内容

最新更新

热门标签：