text2num处理中条件反向文本规范化的字符串扩展


def string_expansion(sent):
list_str = sent.split(" ")
new_list = []
for index,char in enumerate(list_str):
if char == "double":
index +=1
new_chars = (' '.join([list_str[index][:]] * 1))
new_list = [x.replace(char , new_chars).replace(char , new_chars) for x in list_str]
elif char == "triple":
index +=1
new_chars = (' '.join([list_str[index][:]] * 2))
new_list = [x.replace(char , new_chars).replace(char , new_chars) for x in list_str]
else:
new_list

return (" ".join(new_list))
sentence = "my phone number is nine double eight five eight two eight triple seven"
result = string_expansion(sentence)
print(result)

实际输出:我的电话号码是九双八五八八七七七

预期输出:我的电话号码是九八八五八八七七七

我正在研究反向文本规范化问题。在这个过程中,我需要根据条件(double或triple)展开字符串,以便将数字单词传递给text2num函数并产生结果。

double表示下一项需要重复两次,因此将单词double替换为"下一项";并加上两个"下一个项目">

def string_expansion(string):
list_str = string.split(" ")
new_list = []
for index, char in enumerate(list_str):
if char == "double":
new_list.append(list_str[index+1])
elif char == "triple":
new_list.extend([list_str[index+1]] * 2)
else:
new_list.append(list_str[index])
return " ".join(new_list)

string = "my phone number is nine double eight five eight two eight triple seven"
result = string_expansion(string)
print(result)

输出:

my phone number is nine eight eight five eight two eight seven seven seven

这可能有点夸张,但是regex可以让您更快地到达那里。

import re
text = "nine double eight five eight two eight triple seven"
def string_expansion(st):
expand_doubles = re.sub(r'double (w+)', r'1 1', st)
expand_triples = re.sub(r'triple (w+)', r'1 1 1', expand_doubles)

return expand_triples

expanded = string_expansion(text)
print(expanded)

九八八五八二八七七七

如果有必要,您可以轻松地为'四元组'和'五元组'创建替换表达式。

最新更新