使用Textblob从文本中删除所有名词短语



我需要从文本中删除所有专有名词。结果是数据帧。我正在使用文本blob。下面是代码。

from textblob import TextBlob
strings = []
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
Txtblob = TextBlob(text)
for word, pos in Txtblob.noun_phrases:
print (word, pos)
if tag != 'NNP'
print(' '.join(edited_sentence))

它只识别一个NNP

要从以下文本中(从文档中(删除所有标有"NNP"的单词,可以执行以下操作:

from textblob import TextBlob
# Sample text
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.'''
text = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in text.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)

# Notice the lack of the word 'Blob'
'nThe titular threat of The has always struck me as the ultimate
movienmonster: an insatiably hungry, amoeba-like mass able to 
penetratenvirtually any safeguard, capable of--as a doomed doctor 
chillinglyndescribes it--"assimilating flesh on contact.nSnide 
comparisons to gelatin be damned, it's a concept with the 
mostndevastating of potential consequences, not unlike the grey goo 
scenarionproposed by technological theorists fearful ofnartificial 
intelligence run rampant.n'

对样品的评论

from textblob import TextBlob
strings = [] # This variable is not used anywhere
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# txt_blob.noun_phrases will return a list of noun_phrases,
# To get the position of each list you need use the function 'enuermate', like this
for word, pos in enumerate(txt_blob.noun_phrases):
# Now you can print the word and position
print (word, pos)
# This will give you something like the following:
# 0 titular threat
# 1 blob
# 2 ultimate movie monster
# This following line does not make any sense, because tag has not yet been assigned
# and you are not iterating over the words from the previous step
if tag != 'NNP'
# You are not assigning anything to edited_sentence, so this would not work either.
print(' '.join(edited_sentence))

带有新代码的样本

from textblob import TextBlob
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in txt_blob.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)

最新更新