如何从以下HTML中获取id
值?
print(type(author_info))
output: <class 'bs4.element.Tag'>
print(author_info)
output: <script data-mru-fragment="models/user/journal" type="text/plain">
{
"name": "on-line журнал РАЗНЫЕ ЛЮДИ",
"id": "-2812448",
"auId": "8911662942803793376",
"email": "rl_journal",
"dir": "/community/rl_journal/",
"isVip": false,
"isCommunity": true,
"isVideoChannel": false
}
您看到的数据是JSON格式的,您可以使用内置的json
模块将其转换为Python字典(dict
(,然后访问id
密钥:
import json
from bs4 import BeautifulSoup
script_doc = """
<script data-mru-fragment="models/user/journal" type="text/plain">
{
"name": "on-line журнал РАЗНЫЕ ЛЮДИ",
"id": "-2812448",
"auId": "8911662942803793376",
"email": "rl_journal",
"dir": "/community/rl_journal/",
"isVip": false,
"isCommunity": true,
"isVideoChannel": false
}</script>"""
soup = BeautifulSoup(script_doc, "html.parser")
json_data = json.loads(soup.find("script").string)
# With your example using `author_info`:
# json_data = json.loads(author_info.string)
输出:
>>> print(type(json_data))
<class 'dict'>
>>> print(json_data["id"])
-2812448