在字符后面分割列表，如果该行存在错误，则查找该行的错误

我有这样的代码:

import re
accents = ['é', 'à', 'è', 'ù', 'â', 'ê', 'î', 'ô', 'û', 'ç', 'ë', 'ï', 'ü']

with open("test.txt", 'r', encoding='UTF-8') as file:
lines = [line.rstrip() for line in file]
#print(line)
for line in lines:
for accent in accents:
if accent in line:
my_list = line.split(accent)
#print(my_list)
for x in my_list:
if x[-1] == "(" or x[0] == "|" or x[-1] == "|" or x[0] == ")":
pass
else:
print(
"Error found on line:",
lines.index(line)+1,
",", f'"{accent}"',
"has no accent variation")
my_list.pop(1)

这是test.txt文件的内容:

t(é|e)làphone|télephone|portland(e|é)

每当我发现一个重音被使用时，我就拆分文本，这样我就可以在拆分前后查看，以确保我有一个圆括号或管道。我的目标是找到所有没有变化的口音，每次找到一个都显示一个错误。例如，当我找到"t(|e)làphone"时，它只显示"à"的错误。现在代码只显示"à"但不是">;从"telephone"因为它与我上面使用的条件重叠:x[-1] == "|"

我该如何解决这个问题?也许另一种方法?谢谢你！

如何使用正则表达式来查找有变化的重音?它检查前面有"(")的重音。或"|";后面跟着")"或"|"。我用regex101.com测试了这个，regex101.com是一个学习更多正则表达式的好网站。

然后可以遍历字符串并报告regex未找到的任何重音

import re
accents = ['é', 'à', 'è', 'ù', 'â', 'ê', 'î', 'ô', 'û', 'ç', 'ë', 'ï', 'ü']

def alert_bad_accents(test_str):
regex = r"(?<=[(|])[éàèùâêîôûçëïü](?=[)|])"
matches = re.finditer(regex, test_str)
good_locations = [match.start() for match in matches]
for i, c in enumerate(test_str):
if c in accents and i not in good_locations:
print(f"Error found at column {i}, accent {c} has no variation")

test_strs = ["t(é|e)làphone|télephone|portland(e|é)",
"t(é|e)laphonë|telephone|üportland(e|é)"]
for test_str in test_strs:
print(f"Testing {test_str}")
alert_bad_accents(test_str)
print()

相关内容

最新更新

热门标签：