使用Python从json文件中删除注释



我有一个JSON文件,下面显示了注释,我无法在python中读取该文件,因为它是一个无效的JSON文件,我希望用python方法删除文件中以/*开头的所有行,如下所示:

/* 1 */
[{
"_id" : ObjectId("abe"),
"id" : "149",
"objectType" : "act"
}
/* 2 */
{
"_id" : ObjectId("abe415"),
"id" : "449899009",
"objectType" : "ity"
}]

我尝试了下面的代码,但在JSON的loads((中出现了错误:''

import JSON
with open('data.json', 'r+',encoding='utf-8-sig') as handle:
fixed_json = ''.join(line for line in handle if not line.startswith('/*'))
final_data = json.loads(fixed_json)
print(final_data)

"JSONDecodeError:应为值:第4行第13列(字符16(">

提前感谢

对于您没有在这里分享的特殊输入,这是我的regex解决方案:

注意:您需要检查输入错误,并将falsenull或其他关键字转换为类似"false"的字符串。

import json
import re
import emoji

with open('tweets.json', 'r+') as handle:

fixed_json = ''.join(line for line in handle)  
# remove emojis
fixed_json = emoji_pattern.sub(r'', fixed_json)
fixed_json = emoji.replace_emoji(fixed_json, replace='')
fixed_json = fixed_json.replace(' ', '')
fixed_json = fixed_json.replace('false', '"false"')
fixed_json = fixed_json.replace('null', '"null"')
rx = re.compile(r"/*.*/([ns]*{[wW]*?}[ns]*)(?=/*.*/)")
parts = rx.split(fixed_json) 
print(parts[0])
print(len(parts))

tweets=[]
parts = parts[1:]
for tweet in parts:

tweet = tweet.replace('udbb8udf35udbb8udf38', '')
tweet = emoji.replace_emoji(tweet, replace='')

tweet = json.dumps(tweet)

tweets.append(json.loads(tweet) )
print(len(tweets))
# Remove empty elements
result = []
for tweet in tweets:
if len(tweet.strip())>0:
result.append(tweet)
print(len(result))
# convert string to json
results = []
for res in result:
try:
results.append(eval(res))
except:
continue
print(len(results))

输出:

/*10000Tweets*/

19999
19998
10000
5022 #tweets that converted to JSON successfully

最新更新