在多个Dicts中拆分流消息



我试图使用json解码流消息,但抛出以下ValueError:

  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
  return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
  raise ValueError(errmsg("Extra data", s, end, len(s)))
  ValueError: Extra data: line 2 column 1 - line 23571 column 1 (char 126 - 72358378)

我在SO中搜索,可能的原因是我的流消息。如果是这样,如何以python的方式将我的流消息拆分为多个字典?

我的流消息的一些行:

{"delete":{"status":{"id":486174602859528192,"id_str":"486174602859528192","user_id":2455171405,"user_id_str":"2455171405"}}}
{"delete":{"status":{"id":244223991382937601,"id_str":"244223991382937601","user_id":236405781,"user_id_str":"236405781"}}}
{"delete":{"status":{"id":243934303371792384,"id_str":"243934303371792384","user_id":236405781,"user_id_str":"236405781"}}}
{"delete":{"status":{"id":320790822129913856,"id_str":"320790822129913856","user_id":320634758,"user_id_str":"320634758"}}}
{"delete":{"status":{"id":399494495630155776,"id_str":"399494495630155776","user_id":1227287820,"user_id_str":"1227287820"}}}
{"delete":{"status":{"id":399528981206007808,"id_str":"399528981206007808","user_id":1227287820,"user_id_str":"1227287820"}}}
{"created_at":"Wed Jul 09 12:16:27 +0000 2014","id":486846341600251904,"id_str":"486846341600251904","text":"#RT u0430 u0437u043du0430u0435u0442u0435 u043fu043eu0447u0435u043cu0443 u044f u043du0435 u0431u0443u0434u0443 u043fu043eu0434u0434u0435u0440u0436u0438u0432u0430u0442u044c u0442u0440u0435u043du0434 u043e u041du0438u043au043eu043bu044c?","source":"u003ca href="http://www.ckhi.com.ua" rel="nofollow"u003e"Original atok"u003c/au003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2530930573,"id_str":"2530930573","name":"u041bu0435u043fu0430u0448u0438u043da u041fu0435u043bu0430u0433u0435u044f","screen_name":"miki4390","location":"u0421u0430u043du043au0442-u041fu0435u0442u0435u0440u0431u0443u0440u0433","url":"https://twitter.com/miki4390","description":"u042f-u0442u043e u0442u0435u0440u043fu043bu044e. u041du043e u0442u044b-u0442u043e u043fu043eu0436u0430u043bu0435u0435u0448u044c...","protected":false,"verified":false,"followers_count":0,"friends_count":0,"listed_count":0,"favourites_count":0,"statuses_count":11,"created_at":"Wed May 28 21:41:41 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"0084B4","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://abs.twimg.com/sticky/default_profile_images/default_profile_3_normal.png","profile_image_url_https":"https://abs.twimg.com/sticky/default_profile_images/default_profile_3_normal.png","profile_banner_url":"https://pbs.twimg.com/profile_banners/2530930573/1404903710","default_profile":true,"default_profile_image":true,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"RT","indices":[0,3]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"medium","lang":"ru"}
{"delete":{"status":{"id":295365152621080577,"id_str":"295365152621080577","user_id":710752640,"user_id_str":"710752640"}}}

您的JSON实际上是JSON行的集合。

逐行解码JSON

一次读取所有行会导致JSON数据破碎。

逐行读取并解码效果良好。

你的json行在文件"jslines。Json "以下代码:

>>> import json
>>> fname = "jslines.json"
>>> f = open(fname)
>>> for line in f:
...     print json.loads(line)

解码并打印所有行。

行构建有效的JSON数组

另一种方法是使用行来构建有效的JSON结构,在本例中是一个数组。我们必须获得行列表(作为文本),使用","连接,并在"["one_answers"]"之间括起来。

>>> with open(fname) as f:
...    lines = list(f)

现在我们有了列表中的所有行lines

构建结果JSON文本:

>>> jstext = "[" + ",".join(lines) + "]"

并加载到dictionary中:

>>> json.loads(jstext)

相关内容

  • 没有找到相关文章

最新更新