精确的和案例不敏感的匹配，用于字符串Python中的多个单词令牌

i有一个包含单个和多字令牌的列表。

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']

我需要搜索标题字符串中存在的任何这些单词。我能够找到一个单词。但是对于多字代币，我的代码失败了。这是我的代码。请帮助我。这是我的解决方案。

import string
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")
    print("current value of status code ------------>", status_code_value)

更改此：

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):

到此：

if title.lower() in brand_list:

因此：

import string
brand_list = ['ibm','Microsoft','abby softwate', 'TATA computer services']
brand_list = [x.lower() for x in brand_list] # ['ibm', 'microsoft', 'abby softwate', 
                                             #  'tata computer services']
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if title.lower() in brand_list:
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")
    print("current value of status code ------------>", status_code_value)
check_firm('iBM')
check_firm('Tata Computer SERVICES')
check_firm('Khan trading Co.')

输出：

OEM word found
current value of status code ------------> 0
OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1

注意：我使用：
将列表中的所有元素转换为lower()

 brand_list = [x.lower() for x in brand_list]

这将确保对比较正确。

编辑：

op ：，但我的输入图是标题字符串。例如，"塔塔计算机服务赚了X美元"。在这种情况下，我们如何找到字符串？

在这种情况下，我会选择在传递到功能之前将字符串分开：

inp_st1 = 'iBM'
inp_st2 = 'Tata Computer SERVICES made a profit of x dollars'
inp_st3 = 'Khan trading Co.'
check_firm(inp_st1)
check_firm(" ".join(inp_st2.split()[:3])) # Tata Computer SERVICES
check_firm(inp_st3)

您永远无法找到两个单词，因为此代码：

title.lower().split(' ')

说您的标题是 TATA Computer Services ，当您执行该代码时，您将使用：

["tata", "computer", "services"]

然后在您的中进行循环您只会搜索每个单词，从本质上讲，您将标题分解为无法匹配的内容。

用人词写 loop ：

any(one_word.lower() in title.lower().split(' ') for one_word in brand_list)

如果可以在数组[" tata"，"计算机"，" services"]中找到brand_list中的任何单词，则是的。

您可以看到， brand_list 的单词都无法匹配，因为该单词实际上由三个单词和空间组成" tata Computer Services"。

执行您要寻找的内容：

更改此内容：

if any(one_word.lower() in title.lower().split(' ') for one_word in brand_list):

to：

if any(one_word.lower() in title.lower() for one_word in brand_list):

这样，您就在标题中寻找 brand_list 的每个单词。您的代码看起来像这样：

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']
 import string
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(one_word.lower() in title.lower() for one_word in brand_list):
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")
    print("current value of status code ------------>", status_code_value)
check_firm("ibm")
check_firm("abby software")
check_firm("abby softwate apple")

具有以下输出：

OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
OEM word found
current value of status code ------------> 0

编辑

op ：我尝试了您的解决方案。问题在于它也将对" tata Computerssssssssss"之类的输入而保持原样。克服这个问题的任何想法。谢谢

在注释中，强调了此代码将使标题 tat Computer Servicesss 。为了避免这种情况

brand_list = ['ibm','microsoft','abby softwate', 'tata computer services']
import string
import re
def check_firm(test_title):
    translator = str.maketrans('', '', string.punctuation)
    title = test_title.translate(translator)
    if any(re.search(r'b' + one_word.lower() + r'b', title) for one_word in brand_list):
        status_code_value = 0
        print("OEM word found")
    else:
        status_code_value = 1
        print("OEM word not found")
    print("current value of status code ------------>", status_code_value)
check_firm("tata computer services")  
check_firm("tata computer servicessssss")  
check_firm("tata computer services something else")

输出

OEM word found
current value of status code ------------> 0
OEM word not found
current value of status code ------------> 1
OEM word found
current value of status code ------------> 0

感兴趣的部分是：

any(re.search(r'b' + one_word.lower() + r'b', title) for one_word in brand_list):

相关内容

最新更新

热门标签：