python 2.7 - 如何从流中解码ASCII进行分析 - python 2.7 - How to decode ascii from stream for analysis 小贝子编程网

我正在尝试通过文本blob库中的情感分析从twitter api运行文本，当我运行代码时，代码会打印一个或两个情绪值，然后出错，导致以下错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 31: ordinal not in range(128)

我不明白如果代码只是分析文本，为什么这是代码处理的问题。我试图将脚本编码为 UTF-8。这是代码：

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import sys
import csv
from textblob import TextBlob
# Variables that contains the user credentials to access Twitter API
access_token = ""
access_token_secret = ""
consumer_key = ""
consumer_secret = ""

# This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
    def on_data(self, data):
        json_load = json.loads(data)
        texts = json_load['text']
        coded = texts.encode('utf-8')
        s = str(coded)
        content = s.decode('utf-8')
        #print(s[2:-1])
        wiki = TextBlob(s[2:-1])
        r = wiki.sentiment.polarity
        print r
        return True
    def on_error(self, status):
        print(status)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, StdOutListener())
# This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['dollar', 'euro' ], languages=['en'])

有人可以帮我坐这个吗？

提前谢谢你。

你把

太多的东西混在一起了。正如错误所说，您正在尝试解码字节类型。

json.loads将导致数据为字符串，您需要对其进行编码。

texts = json_load['text'] # string
coded = texts.encode('utf-8') # byte
print(coded[2:-1])

因此，在脚本中，当您尝试解码coded时，您收到有关解码byte数据的错误。

python 2.7 - 如何从流中解码ASCII进行分析

相关内容

最新更新

热门标签：