如何仅从tweepy提取的推文中获取文本零件



我正在做一个类似于情感分析的事情的研究项目。我已经使用Tweepy从Twitter提取了推文。我得到的数据就是这样:

{"created_at":"Sat Apr 22 07:28:47 +0000 2017","id":855684794939842560,"id_str":"855684794939842560","text":"#PL | FIXTURES - 22 April 2017 nWest Ham v Everton 16:00nHull v WatfordnSwansea v Stoke nBournemouth v Middlesbrough #CCFMSport","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":256051042,"id_str":"256051042","name":"Ayanda Frances Felem","screen_name":"AyandaFelemZA","location":"Cape Town, South Africa","url":"http://ccfm.org.za","description":"Sports Producer/Reporter for @RadioCCFm, Views are my own. ayanda@ccfm.org.za","protected":false,"verified":false,"followers_count":446,"friends_count":1648,"listed_count":23,"favourites_count":1625,"statuses_count":16110,"created_at":"Tue Feb 22 15:15:38 +0000 2011","utc_offset":7200,"time_zone":"Pretoria","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme11/bg.gif","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme11/bg.gif","profile_background_tile":false,"profile_link_color":"DD2E44","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/850335374446665728/BvVIo7oB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/850335374446665728/BvVIo7oB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/256051042/1491570881","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"PL","indices":[0,3]},{"text":"CCFMSport","indices":[117,127]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1492846127625"}

现在,我只想从此文件中提取"文本"。我已经尝试过:

import json
tweets_data_path = 'twitter_streaming.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
json_load = json.load(tweets_file)
texts = json_load['text']
coded = texts.encode('utf-8')
s = str(coded)
tweets_data.append(s[1:-2))
print tweets_data

但我有一个错误说:

json.decoder.jsondecodeerror:期望值:第1行1(char 0(

尝试寻找此错误的原因,但没有找到任何具体的。

我在做什么错?有更好的方法吗?

null,false = None,False
a = {"created_at":"Sat Apr 22 07:28:47 +0000 2017","id":855684794939842560,"id_str":"855684794939842560","text":"#PL | FIXTURES - 22 April 2017 nWest Ham v Everton 16:00nHull v WatfordnSwansea v Stoke nBournemouth v Middlesbrough #CCFMSport","source":"u003ca href="http://twitter.com" rel="nofollow"u003eTwitter Web Clientu003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":256051042,"id_str":"256051042","name":"Ayanda Frances Felem","screen_name":"AyandaFelemZA","location":"Cape Town, South Africa","url":"http://ccfm.org.za","description":"Sports Producer/Reporter for @RadioCCFm, Views are my own. ayanda@ccfm.org.za","protected":false,"verified":false,"followers_count":446,"friends_count":1648,"listed_count":23,"favourites_count":1625,"statuses_count":16110,"created_at":"Tue Feb 22 15:15:38 +0000 2011","utc_offset":7200,"time_zone":"Pretoria","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http://abs.twimg.com/images/themes/theme11/bg.gif","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme11/bg.gif","profile_background_tile":false,"profile_link_color":"DD2E44","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http://pbs.twimg.com/profile_images/850335374446665728/BvVIo7oB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/850335374446665728/BvVIo7oB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/256051042/1491570881","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"PL","indices":[0,3]},{"text":"CCFMSport","indices":[117,127]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1492846127625"}
print a["text"]

我只是使用了这条代码,它返回了以下输出。

#PL | FIXTURES - 22 April 2017 
West Ham v Everton 16:00
Hull v Watford
Swansea v Stoke 
Bournemouth v Middlesbrough #CCFMSport

尽管问题尚不清楚,但您是否正在寻找此文本?

此代码正常工作 -

import json
tweets_data_path = 'twitter_data.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
json_load = json.load(tweets_file)
texts = json_load['text']
print(texts)

如果预期输出为 -

,则不需要以下代码部分
coded = texts.encode('utf-8')
s = str(coded)
tweets_data.append(s[1:-2))
print tweets_data
#output
'''
#PL | FIXTURES - 22 April 2017 
West Ham v Everton 16:00
Hull v Watford
Swansea v Stoke 
Bournemouth v Middlesbrough #CCFMSport
None
'''

最新更新