NLTK创建的字符串正则不起作用

我正在尝试为我从NLTK获得的字符串进行正则匹配。我有一个股票课程，其中一种方法可以从埃德加（Edgar）获得10k的方法，并将其下载到使用NLTK的字符串中。

def get_raw_10ks(self):
                for file in self.files_10k:
                        data = self.__get_data_from_url(file)
                        raw = nltk.clean_html(data)
                        self.raw_10ks.append(raw)

然后，在我的程序本身中，我有

stock.get_raw_10ks()
matchObj = re.match("Indicates", stock.raw_10ks[0])
print matchObj.group()

我得到错误

print matchObj.group()
AttributeError: 'NoneType' object has no attribute 'group'

但是，当我检查stock.raw_10ks[0]的类型时，它是一个字符串，当我打印出来时，最后一行是"指示管理补偿计划"，所以我不确定怎么了。我检查了RE和NLTK正确导入。

re.match()匹配输入字符串开头的模式。您应该使用re.search()。

# match()
>>> re.match('Indicates', 'Indicates management compensatory')
<_sre.SRE_Match object at 0x0000000002CC8100>
>>> re.match('Indicates', 'This Indicates management compensatory')
# search()
>>> re.search('Indicates', 'This Indicates management compensatory')
<_sre.SRE_Match object at 0x0000000002CC8168>

请参阅search() vs match()。

使程序可靠检查呼叫的返回值：

matchObj = re.search("Indicates", stock.raw_10ks[0])
if matchObj is not None: # OR  if matchObj:
    print matchObj.group()
else:
    print 'No match found.'

btw，如果要检查Indicates在字符串中，则使用in操作员更可取：

>>> 'Indicates' in 'This Indicates management compensatory'
True
>>> 'Indicates' in 'This management compensatory'
False

相关内容

最新更新

热门标签：