小贝子编程

Python：如何自动拼写检查和更正连接词，如"reportthatexplains"和"havebeen"

本文关键字：连接词 havebeen reportthatexplains 何自动检查和 Python python nlp
更新时间 : 2023-09-21
英文 : Python: how to automatically spellcheck and correct joined words such as "reportthatexplains" and "havebeen"

我有一些大的文本文件，它们的英文正确，因为是从pdf中提取的。然而，这些文本文件中的许多单词被连接在一起：；否则的信息"已经"解释"；。每个拼写检查器都会发现这些错误，例如LanguageTool、Sublime、MS Word。然而，Python举步维艰。

我尝试了pyspellchecker和TextBlob来检查和更正这些单词，但遗憾的是，没有用。

例如，请参阅此代码，该代码三次返回None。

misspelled = spell.unknown(["informationotherwise", "havebeen", "reportthatexplains"])
for word in misspelled:
print(spell.correction(word))
print(spell.candidates(word))

这个代码：

t ="havebeen"
TextBlob(t).correct().string
>>> 'havebeen'

有什么建议吗？

使用单词忍者库将长单词拆分为子单词

import wordninja
word  = ["informationotherwise", "havebeen", "reportthatexplains"]
for x in word :
print(' '.join(wordninja.split(x)))
#op
information otherwise
have been
report that explains

Python：如何自动拼写检查和更正连接词，如"reportthatexplains"和"havebeen"

相关内容

最新更新

热门标签：