用仲裁顺序与字符串匹配数组元素

我是Python的新手，并试图找到一条tweet是否具有任何查找元素。

例如。如果我能找到猫这个词，它应该匹配猫，也可以按任何任意顺序匹配可爱的小猫。但是据我了解，我无法找到解决方案。任何指导将不胜感激。

import re
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']
for tweet in tweets:
    lookup_found = None
    print re.findall(r"(?=(" + '|'.join(lookup_table) + r"))", tweet.lower())

输出

['cat']
[]
[]
['dog litter park']
[]

预期输出：

that is a cute cat > cats
kittens are cute > cute kittens
this is a cute kitten > cute kittens
that is a dog litter park > dog litter park
no wonder that dog park is bad > dog litter park

对于仅是一个单词文字的查找单词，您可以使用

for word in tweet

和查找单词，例如"可爱小猫"，您期望有任何订单。只需将单词分开并在Tweet字符串中寻找。

这是我尝试的，它不是有效的，而是它的工作。尝试运行它。

lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']
for word in lookup_table:
    for tweet in tweets:
        if " " in word:
            temp = word.split(sep=" ")
        else:
            temp = [word]
        for x in temp:
            if x in tweet:
                print(tweet)
                break

这就是我的做法。我认为Lookup_table不必太严格，我们可以避免使用复数；

import re
lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
      'kittens are cute',
      'that is a cute kitten',
      'that is a dog litter park',
      'no wonder that dog park is bad']
for data in lookup_table:
    words=data.split(" ")
    for word in words:
        result=re.findall(r'[ws]*' + word + '[ws]*',','.join(tweets))
        if len(result)>0:
            print(result)

问题1：

单数/复数：只是为了让事情滚动，我会使用fortect，这是一个摆脱奇异＆amp;的python包装。复数等...

问题2：

分裂和加入：我写了一个小脚本，以演示如何使用它，而不是经过良好的测试，但应该让您移动

import inflect 
p = inflect.engine()
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']
for tweet in tweets:
    matched = []
    for lt in lookup_table:
            match_result = [lt for mt in lt.split() for word in tweet.split() if p.compare(word, mt)]
            if any(match_result):
                matched.append(" ".join(match_result))
    print tweet, '>>' , matched

相关内容

最新更新

热门标签：