来自 html 源的正则表达式 json 表单



我已经到了我不设法从 html 源中正则表达式 json 值的部分。

html 源代码如下所示:

<script data-csp-hash=""> window.__webpack_public_path__='https://renderer-assets.typeform.com/';
window.__webpack_nonce__='3088edaa602c001b5f6e1f31e3179422';
window.rendererAssets='["https://renderer-assets.typeform.com/vendors~libphonenumber~submission.c94d30638908af997673.js","https://renderer-assets.typeform.com/country-data.526012987a7e72182726.js","https://renderer-assets.typeform.com/form-container.98c74a2ac320736bdb16.js","https://renderer-assets.typeform.com/renderer.8282fd35106b77e43e2f.js","https://renderer-assets.typeform.com/submission.5d9a15e294b33a20ea2e.js","https://renderer-assets.typeform.com/vendors~form-container.b5fb128466f604baadba.js","https://renderer-assets.typeform.com/vendors~video.aa830e76dcc8735c9936.js","https://renderer-assets.typeform.com/video.45eca666f47b245e8fdb.js"]';
window.rendererData= {
rootDomNode: 'root',
form:     {
"id":"Z3PvTW",
"title":Testing",
"welcome_screens":[ {
"ref":"a13820db-af60-40eb-823d-86cf0f20299b",
"title":"Yessir!",
"properties": {
"show_button": true, "button_text": "Start"
}
}
],
"thankyou_screens":[ {
"ref":"default_tys",
"title":"Done! Your information was sent perfectly.",
"properties": {
"show_button": false, "share_icons": false
}
}
],
"fields":[ {
"id":"kxWycKljdtBq",
"title":"FIRST NAME",
"ref":"27f403f7-8c5b-4e18-b19d-1501e8f137ee",
"validations": {
"required": true
}
,
"type":"short_text"
}
,
{
"id":"WEXCnZ7EAFjN",
"title":"LAST NAME",
"ref":"a6bf6d83-ee37-4870-b6c5-779822290cde",
"validations": {
"required": true
}
,
"type":"short_text"
}
,
{
"id":"ButwoV1bTge5",
"title":"EMAIL ADDRESS",
"ref":"8860a4cf-71ec-4bfa-a2c7-934fd405f200",
"properties": {
"description": "Note for stackoverflow!"
}
,
"validations": {
"required": true
}
,
"type":"email"
}
],
"_links": {
"display": "link.com"
}
}
,
messages: {
"a11y.file-upload.remove":"Remove uploaded file",
}
,
trackingInfo: {
"segmentKey": "9at6spGDYXelHDdz4r0cP73b3wV1f0ri", "accountId": 12587347, "accountLimitName": "Essentials", "userId": 12586030
}
,
stripe: null,
showBranding: true,
accessScheduling: {
"closeScreenData": {
"title":"This typeform isn't accepting new responses",
"description":"",
"brandingMottoText":"How you ask is everything",
"brandingButtonText":"Create a *typeform*",
"attachment": {}
,
"textColor": "#3D3D3D", "showBranding": true, "brandingButtonColor": "#000000", "buttonRedirectLink": "https:u002Fu002Fwww.typeform.comu002Fsignup?utm_campaign=undefined&utm_source=typeform.com-12587347-Essentials&utm_medium=typeform&utm_content=typeform-closescreen&utm_term=EN"
}
}
,
featureFlags: {
"always-inject-new-relic": false, "beta-testers": false, "sb-3671-inline-submit-flow": "out-of-experiment", "sb-3671-new-submit-flow": false
}
}
;
window.rendererTheme= {
color: '#3D3D3D',
backgroundColor: {
red: '255', green: '255', blue: '255'
}
}
;

我想刮的是部分的形式值 json:

{
"id":"Z3PvTW",
"title":Testing",
"welcome_screens":[ {
"ref":"a13820db-af60-40eb-823d-86cf0f20299b",
"title":"Yessir!",
"properties": {
"show_button": true, "button_text": "Start"
}
}
],
"thankyou_screens":[ {
"ref":"default_tys",
"title":"Done! Your information was sent perfectly.",
"properties": {
"show_button": false, "share_icons": false
}
}
],
"fields":[ {
"id":"kxWycKljdtBq",
"title":"FIRST NAME",
"ref":"27f403f7-8c5b-4e18-b19d-1501e8f137ee",
"validations": {
"required": true
}
,
"type":"short_text"
}
,
{
"id":"WEXCnZ7EAFjN",
"title":"LAST NAME",
"ref":"a6bf6d83-ee37-4870-b6c5-779822290cde",
"validations": {
"required": true
}
,
"type":"short_text"
}
,
{
"id":"ButwoV1bTge5",
"title":"EMAIL ADDRESS",
"ref":"8860a4cf-71ec-4bfa-a2c7-934fd405f200",
"properties": {
"description": "Note for stackoverflow!"
}
,
"validations": {
"required": true
}
,
"type":"email"
}
],
"_links": {
"display": "link.com"
}
}

我已经能够通过使用这个几乎刮掉它

(?sm)^s*form:s*{(.*?)n}$ #Not quite sure if this would work in Python however.

https://regex101.com/r/zTJQ0A/3

但是,我的问题是它继续抓取表单值之后的内容,例如消息,跟踪信息,条纹等,我只想能够获取表单json,而没有其他内容。

如何只能获取form:json 值的正则表达式?

您可以尝试此方法:

data = '''....'''
data = re.findall("form:[Ss]*messages",data)[0]
data = re.sub("^form:","",data)
data = re.sub(",n|smessages","",data)
print data

最新更新