Tokenize don't to don not not use NLTK Python



当我使用时:

nltk.word_tokenize("don't")

我得到

["do", "n't"]

我想要的是:

["dont"]

您可以使用TweetTokenizer

from nltk.tokenize import TweetTokenizer
tweet_tokenizer = TweetTokenizer()
sen = "don't won't can't"
res = [x.replace("'", '') for x in tweet_tokenizer.tokenize(sen)]
print(res)

输出:

['dont', 'wont', 'cant']

相关内容

  • 没有找到相关文章

最新更新