值错误:第 173 行115076处的控制字符无效



我在python中解析json文件时遇到问题。

我的代码在这里:

import json
from pprint import pprint
with open('review_sample.json') as data_file:
data = json.load(data_file)
pprint(data)

JSON文件格式在这里:

    {
    "table": "TempTable",
    "rows":
    [
        {
            "comment_id": "R1KLDHE77IOLUM",
            "crawl_time": "2015-07-17 22:55:16",
            "title": "Excellent TV, excellent price... but look out for bugs.",
            "overall_rating": "5",
            "purchase": "Verified Purchase",
            "comment": "This is an excellent TV at an excellent price. For those who say that you can't tell the difference between 4k and ***p, I disagree. I compared this side by side to my *** LG 55' ***p set, and the resolution and sharpness of the image is just no comparison. Can you see an individual pixel from a normal viewing distance on either set? Of course not. But you can see when things start to get fuzzy and pixelated with a large ***p set, and that simply is not an issue with 4K. Picture quality is outstanding but you will want to tweak picture settings - I find that 'Standard' and 'Photo' modes are the best right out of the box, but worth customizing. I also turned off TruMotion, which seemed to be creating some lag when gaming, and is also a bit unsettling for movies and TV (which are usually filmed in 24 and 30 FPS, respectively, rather than 120 FPS TruMotion). 4K playback from Netflix and Amazon Instant Video are superb, as is upscaling from a ***p source. I was surprised how great Battlefield Hardline looked when upscaled to 4K. Overall, WebOS 2.0 is a joy to use, though I'm not a huge fan of the Smart Remote - just clunky to use and not really necessary. I had a bit of a scare when suddenly every 20th vertical row of pixels started bugging out in rainbow colors - see photos. I cycled the power and everything was fine, so I suspect that this was a software bug in the upscaling process (was playing Xbox One at ***p at the time). Will update this review if it happens again. Build quality feels good and the TV looks great - very sleek, slim, and minimal bezels.",
            "site": "amazon",
            "brand": "lg",
            "country_code": "us",
            "product_group_name": "tv",
            "product_name": "smarttv",
            "model_name": "4k",
            "model_code": "*UF7600"
        }
    ]
}

如果我有一些评论,它不会造成问题。 但是如果我加载完整的 JSON 文件(很多评论),就会发生值错误。错误消息在这里。

Traceback (most recent call last):
  File "D:/kaggle/word2vec/server.py", line 11, in <module>
    data = json.load(data_file)
  File "C:Anaconda2libjson__init__.py", line 291, in load
    **kw)
  File "C:Anaconda2libjson__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "C:Anaconda2libjsondecoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:Anaconda2libjsondecoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 115076 column 173 (char 8301811)
Process finished with exit code 1

请帮忙。

你不能

json.load文件句柄,你应该加载字符串。

我有你的内容.txt

[root@ebs-49393 tmp]# python test.py

{u'table': u'TempTable', u'rows': [{u'comment': u"这是一款价格优惠的优秀电视。对于那些说你无法区分 4k 和 *p 的人,我不同意。我将其与我的* LG 55' ****p套装并排比较,图像的分辨率和清晰度是无法比拟的。你能从正常观看距离看到任何一个像素吗?当然不是。但是你可以看到什么时候事情开始变得模糊和像素化,而这根本不是4K的问题。图片质量非常出色,但您需要调整图片设置 - 我发现"标准"和"照片"模式是开箱即用的最佳模式,但值得自定义。我还关闭了 TruMotion,这似乎在游戏时造成了一些滞后,对于电影和电视(通常分别以 24 和 30 FPS 拍摄,而不是 120 FPS 的 TruMotion)来说也有点不安。来自Netflix和Amazon Instant Video的4K播放非常棒,从***p源升级也是如此。我很惊讶《战地风云》升级到4K时看起来有多棒。 总的来说,WebOS 2.0使用起来很愉快,尽管我不是智能遥控器的忠实粉丝 - 只是使用起来很笨重,并不是真的必要。当突然每 20 个垂直行像素开始以彩虹色出现时,我有点害怕 - 看照片。我循环了电源,一切都很好,所以我怀疑这是升级过程中的软件错误(当时正在 ***p 玩 Xbox One)。如果再次发生,将更新此评论。制造质量感觉很好,电视看起来很棒 - 非常时尚、纤薄和最小的边框model_code。但是要注意错误.', u'brand': u'lg', u'comment_id': u'R1KLDHE77IOLUM', u'site': u'amazon', u'model_name': u'4k', u'crawl_time': u'2015-07-17 22:55:16', u'country_code': u'us', u'product_group_name': u'tv', u'product_name': u'smarttv', u'overall_rating': u'5'}]}

[root@ebs-49393 TMP]# 猫 test.py

导入 JSON

buf = open('./a.txt').read()

j = json.loads(buf)

打印 J

最新更新