查找句子中单词的索引，即使它部分匹配(模糊)

我需要在一个句子中找到一个单词的开始索引，即使它部分匹配。

我试过find()方法。但是只有当单词是精确匹配时，它才会匹配。

代码:

import re
body  = 'Charles D�Silva  |  Technical Officer  Anglo (UK) Limited'
word = 'ANGLO (UK) LTD'
start_idx = body.lower().find(word.lower())
print(match.start())

所以我的输出应该是，需要得到Anglo (UK) Limited的起始索引在句子中，还需要得到句子中部分匹配的单词(Anglo (UK) Limited)。

对以上问题有什么建议吗?

此解决方案评估在两个字符串中找到的单词数量，如果找到的单词数量达到阈值，则打印索引。

设置'match_threshold'变量，以便如果给出3个单词，只需要找到2个。所包含的max()函数确保阈值始终至少为1。

body = 'Charles D�Silva  |  Technical Officer  Anglo (UK) Limited'
word = 'ANGLO (UK) LTD'
# Split 'word' into list of strings.
words_to_find = word.lower().split()
# Create list of words that are found in both strings
found_words = [_ for _ in words_to_find if _ in body.lower()]
# Initialize match_threshold
match_threshold = max(len(words_to_find) - 1, 1)
if len(found_words) >= match_threshold:
found_index = body.lower().index(found_words[0])
print(found_index)

注意:此解决方案考虑了大写，但对打字错误很敏感。此外，找不到合适的词(比如if "anglo")是某人名字的一部分(在'body'字符串中)将产生误报。

可以通过创建一个遍历body的函数，并检查该字符是否等于子字符串word中的特定字母。在第一次出现时，它将索引保存到变量first_match。在函数的末尾，如果找到了整个子字符串，则返回first_match，否则返回default参数。

这个<<p> strong>不返回"Anglo (UK) Limited"的起始索引39，因为'a'已经存在于"Charles"中。没有任何逻辑(我能想到)应用返回39，同时仍然成功地找到"Limited"作为"LTD"。

def find_partial_match(string: str, substring: str, default: None | int = None) -> int | None:
iterator = 0
first_match = None
# Iterates through string to find if it matches substring[index]
for index, char in enumerate(string):
if char == substring[iterator]:
if iterator == 0:
first_match = index
iterator += 1
# Return the index if everything has matched, else `default`
return first_match if iterator == len(substring) else default

body = 'Charles D�Silva  |  Technical Officer  Anglo (UK) Limited'
word = 'ANGLO (UK) LTD'
print(find_partial_match(body.lower(), word.lower()))

相关内容

最新更新

热门标签：