如何从python中的script标签中获取var值编号

在给定的.html页面中，我有一个脚本标记，如下所示：

<script>
some data 
</script>
<body>
some data
</body>
<script>
var breadcrumbData = {"level":0,"currentCategoryName":"Kebutuhan Dapur","currentCategoryId":"5b85712ca3834cdebbbc4363","parentCategoryId":"","parentCategoryName":null}; 
var pageList = {"totalData":549,"totalPage":12,"pageSize":48,"currentPage":1}; 
var pageSize = 48;
</script>

我正试着用汤取回总页码。

我的以下代码是这样的：

pattern= re.compile(r'"totalPage":(d+);', re.MULTILINE | re.DOTALL) 
scripts =soup.find_all('script', text=pattern)
if scripts:
match = pattern.search(scripts.text)
print(match)

上面的代码返回了一个空白列表，而我只需要将数字12作为数字返回。请帮忙。

有很多方法可以提取数字：

1.使用普通`re`

import re
from bs4 import BeautifulSoup

html_doc = """
<script>
some data 
</script>
<body>
some data
</body>
<script>
var breadcrumbData = {"level":0,"currentCategoryName":"Kebutuhan Dapur","currentCategoryId":"5b85712ca3834cdebbbc4363","parentCategoryId":"","parentCategoryName":null}; 
var pageList = {"totalData":549,"totalPage":12,"pageSize":48,"currentPage":1}; 
var pageSize = 48;
</script>"""
soup = BeautifulSoup(html_doc, "html.parser")
script = soup.find("script", text=lambda t: t and "totalPage" in t)
print(re.search(r"totalPageD+(d+)", script.text).group(1))

打印：

2.使用`js2py`

import js2py
script = soup.find("script", text=lambda t: t and "totalPage" in t)
s = "function $() {" + script.text + " return pageList;}"
print(js2py.eval_js(s)()["totalPage"])

打印：

3.使用`re`/`json`

import re
import json
script = soup.find("script", text=lambda t: t and "totalPage" in t)
n = json.loads(re.search(r"pageList = (.*);", script.text).group(1))[
"totalPage"
]
print(n)

打印：

1.使用普通`re`

2.使用`js2py`

3.使用`re`/`json`

相关内容

最新更新

热门标签：

如何从python中的script标签中获取var值编号

1.使用普通re

2.使用js2py

3.使用re/json

相关内容

最新更新

热门标签：

1.使用普通`re`

2.使用`js2py`

3.使用`re`/`json`