替换字符串中的字符并形成所有可能的组合



我正在用pytesseract进行文本识别。有时文本不能正确提取。例如,"DDR4"可以解释为"ODR4">

因此,我有一个字典,记录所有可能的转义,和代码来检测有多少字符需要替换和它的索引,例如,

my_dictionary= {
'D': ['O', '0'],
'O': 'D',
'0': 'D'
}
user_input = "DDR4"
char_to_replace = 0
char_index = []
for index, val in enumerate(user_input):
if val in my_dictionary:
char_to_replace += 1
char_index.append(index)

在本例中,我如何生成一个包含所有可能组合的列表,例如

D0R4, DOR4, 00R4, 0OR4, OOR4, 0r4, 0DR4, ODR4

感谢所有的投入

这就是我想出来的,相当丑陋的代码,我相信它可以更容易地完成与itertools,但什么鬼-我希望它有帮助:

my_dictionary = {
'D': ['O', '0'],
'O': 'D',
'0': 'D'
}
user_input = "DDR4"

def replace_char(variable=None, replace_index=None, replace_with=None):
"""Supplementary function"""
return variable[:replace_index] + replace_with + variable[replace_index+1:]

# create maximum required iterations (max len of list in dict)
number_of_iterations_required = max([len(f) for f in my_dictionary.values()])
# list for preliminary word combinations
baseword_combinations = [user_input]
for key, val in my_dictionary.items():
for idx, char in enumerate(user_input):
if char == key:
for v in val:
baseword_combinations.append(replace_char(variable=user_input, replace_index=idx, replace_with=v))
# list for final returns, again append initial input
possible_combinations = [user_input]
for word in baseword_combinations:
for idx, char in enumerate(word):
for key, val in my_dictionary.items():
for v in val:
if char == key:
possible_combinations.append(replace_char(variable=word, replace_index=idx, replace_with=v))
if char == val:
possible_combinations.append(replace_char(variable=word, replace_index=idx, replace_with=key))

# get rid of duplicates, print result
print(list(set(possible_combinations)))

结果:

['OOR4', '00R4', 'ODR4', 'D0R4', '0DR4', 'DDR4', '0OR4', 'O0R4', 'DOR4']

编辑

带有number_of_iterations_required的部分在我上面的代码中没有使用,我还稍微修改了一下以使用列表推导-这使得它更容易理解,但更短,所以你可以这样做:
my_dictionary = {
'D': ['O', '0'],
'O': 'D',
'0': 'D'
}
user_input = "DDR4"

def replace_char(variable=None, replace_index=None, replace_with=None):
"""Supplementary function"""
return variable[:replace_index] + replace_with + variable[replace_index+1:]

# list for preliminary word combinations
base = [user_input]
base.extend([replace_char(user_input, idx, v) for key, val in my_dictionary.items()
for idx, char in enumerate(user_input) for v in val if char == key])
# list for final results
final_results = [user_input]
final_results.extend([replace_char(word, idx, key) for word in base for idx, char in enumerate(word)
for key, val in my_dictionary.items() for v in val if char == val])
result = list(set(final_results))
print(result)

我正在尝试用这种方法解决这个问题,希望能对你有所帮助:

import itertools as it
my_list= ['D','O','0']
result = []
input = 'DDR4'
# 'DDR4' -> '**R4'
for i in range(len(my_list)):
if input[i] in my_list:
input = input.replace(input[i],'*')

for e in it.product('DO0',repeat=input.count('*')):
a = list(e)
input_copy = input
for i in a:
# print(i)
input_copy = input_copy.replace('*',i,1)
result.append(input_copy)

结果:

['DDR4', 'DOR4', 'D0R4', 'ODR4', 'OOR4', 'O0R4', '0DR4', '0OR4', '00R4']