我有一个外部 JSON 文件,其中包含一些内联编码、HTML 标记、 和 \t 字符,我想删除所有这些东西,并希望只保留字符串而不破坏 JSON 格式到目前为止我已经尝试过这个并看到许多解决方案,但没有任何效果。非常感谢您的时间。这是我的代码
我正在使用python 3.x.x
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
print(data)
请注意,这是我正在输入的文件(从同一文件夹导入(并且我想要相同的输出但没有html标签,没有内联样式,没有或其他只有字符串的东西。
[
{
"idrfi" : 36809,
"fkproject" : 33235,
"subject" : "M2 - Flashing Clarifications",
"description" : "<ol style="margin-left:0.375in">nt<li><span style="font-family:calibri; font-size:11pt">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>n</ol>n",
"response" : null
},
{
"idrfi" : 36808,
"fkproject" : 33139,
"subject" : "M1 - Flashing Clarifications",
"description" : "<ol style="margin-left:0.2in">nt<li><span style="font-family:calibri; font-size:11pt">Refer to detail 6/A612 attached. Clarify location of flashing on detail.</span></li>nt<li><span style="font-family:calibri; font-size:11pt">Refer to details 2,4/A614 attached. Clarify location of flashing on detail. </span></li>nt<li><span style="font-family:calibri; font-size:11pt">Refer to detail 3/A616 attached. Clarify location of flashing on detail.</span></li>nt<li><span style="font-family:calibri; font-size:11pt">Refer to detail 5/A650 attached. Can the pre-finished metal panel be swapped for pre-finished metal flashing? This will allow the full assembly to be installed by the mechanical HVAC trade vs requiring the cladding trade to return for penthouse work. </span></li>n</ol>n",
"response" : null
}
]
我找到了该功能,但我不知道如何实现它
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
编辑在此实现之后, ,,\t和其他东西没有删除我只想要字符串没有标签没有样式什么都没有
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
removed_tags = remove_html_tags(data)
print(removed_tags)
只需调用您编写的函数
import json, re
from html.parser import HTMLParser
def remove_html_tags(data):
p = re.compile(r'<.*?>')
return p.sub('', data)
with open('project-closedtasks-avgdaysopen.json') as f:
data = json.load(f)
data = json.dumps(data, indent=4)
removed_tags = remove_html_tags(data)
print(removed_tags)
我检查了一下,它工作正常。