我是一个初学者Python程序员,我发现很难弄清楚一个简单的Tweepy Streaming api。
基本上我正在尝试执行以下操作。
-
用葡萄牙语流式传输推文。
-
显示每条推文的情绪。
我无法流式传输语言推文。有人可以帮助我弄清楚我做错了什么。
import tweepy
from textblob import TextBlob
### I have the keys updated on those veriables
auth = tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN,ACCESS_TOKEN_SECRET)
API = tweepy.API(auth)
class MyStreamListener(tweepy.StreamListener):
def on_status(self, status):
print("--------------------")
print(status.text)
analysis = TextBlob(status.text)
if analysis.sentiment.polarity > 0:
print("sentiment is positiv")
elif analysis.sentiment.polarity == 0:
print("sentiment is Neutral")
else:
print("sentiment is Negative")
print("--------------------n")
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = API.auth, listener=myStreamListener, tweet_mode='extended', lang='pt')
myStream.filter(track=['trump'])
示例 o/p 是
RT @SAGEOceanTweets: Innovation Hack Week 2019: @nesta_uk is exploring the possibility of holding a hack week in 2019, focused on state-of-�
但是它在几条推文后停止,我收到此错误
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode
character 'U0001f4ca' in position 76: character maps to <undefined>
[Finished in 85.488s]
而且推文也不是葡萄牙语。如何连续流式传输并获取葡萄牙语推文并执行情绪分析
你们能否也指导我如何流式传输语言推文,然后使用 textblob 分析情绪。
谢谢
此代码可以帮助您实现目标:
NLP 推特流媒体情绪
它从Twitter收集数据并分析情绪。但是,如果你想用葡萄牙语开发情感分析,你应该使用经过训练的葡萄牙语维基百科(Word2Vec(来获取训练模型的词嵌入。这是您唯一可以可靠地做到这一点的方法。NLTK和Gensim在英语中效果更好,NLTK在葡萄牙语中非常有限。
from nltk import sent_tokenize, word_tokenize, pos_tag
from nltk import sent_tokenize, word_tokenize, pos_tag
import nltk
import numpy as np
from nltk.stem import WordNetLemmatizer
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
import re
consumer_key = '12345'
consumer_secret = '12345'
access_token = '123-12345'
access_secret = '12345'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
number_tweets=100
data=[]
for status in tweepy.Cursor(api.search,q="trump").items(number_tweets):
try:
URLless_string = re.sub(r'w+:/{2}[dw-]+(.[dw-]+)*(?:(?:/[^s/]*))*', '', status.text)
data.append(URLless_string)
except:
pass
lemmatizer = WordNetLemmatizer()
text=data
sentences = sent_tokenize(str(text))
sentences2=sentences
sentences2
tokens = word_tokenize(str(text))
tokens=[lemmatizer.lemmatize(tokens[i]) for i in range(0,len(tokens))]
len(tokens)
tagged_tokens = pos_tag(tokens)
tagged_tokens