使用Python从HTML网页中解析JSON



我想使用Python从网站中提取数据。我以前也做过这样的事情,但我第一次遇到这样的结构。它看起来是一个底部有json的html页面。我可以使用beautifulsoup获取html,但我需要提取包含数据的json。

下面是我的代码示例,它确实返回了带有json的html。我最初尝试使用request,但脚本在没有任何事情发生的情况下运行时出现了问题,所以我使用了带有urllib的beautifulsoup。我认为这与网站的结构有关。

以下是该网站的链接:https://www.bizbuysell.com/california-businesses-for-sale?q=aTI9ODEsNTcsMzA%3D

http = urllib3.PoolManager()
url = "https://www.bizbuysell.com/connecticut-businesses-for-sale/?q=bHQ9MzAsNDAsODA%3D"
response = http.request('GET', url)
soup = BeautifulSoup(response.data, "html.parser")

如何解析此页面中包含的Json数据的示例:

import json
import requests
from bs4 import BeautifulSoup
url = "https://www.bizbuysell.com/connecticut-businesses-for-sale/?q=bHQ9MzAsNDAsODA%3D"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = soup.select_one('[data-stype="searchResultsPage"]').contents[0]
data = json.loads(data)
# pretty print the data
print(json.dumps(data, indent=4))

打印:

{
    "@context": "http://schema.org",
    "@type": "SearchResultsPage",
    "speakable": {
        "@type": "SpeakableSpecification",
        "xpath": [
            "/html/head/title",
            "/html/head/meta[@name='description']/@content"
        ]
    },
    "about": [
        {
            "item": {
                "@type": "Product",
                "name": "Moving  Company",
                "alternateName": null,
                "logo": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
                "image": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
                "description": "The company is made up of three department. A licensed Household Goods Relocation and Eviction, an Insurance Agency, and also a Thrift Store. The reason why the company is established this way is the three departments work very well together. Most times someone calls us for services and require special Insurance Coverages. We represent several Insurance Companies and Wholesalers which us a great advantage to obtain the required Insurance Coverage without delays.  Most time we relocate clients who are downsizing, children are grown, and moved out, and therefore do not have need for lots of furniture which we either purchase at minimal cost or given to us for free. It's a win win situation for the company. The items are sold very fast because the selling price is extremely low and the profit margin is very high.",
                "url": "/Business-Opportunity/moving-company/1791243",
                "productId": "1791243",
                "offers": {
                    "@type": "Offer",
                    "price": 450000,
                    "priceCurrency": "USD",
                    "availability": "http://schema.org/InStock",
                    "url": "/Business-Opportunity/moving-company/1791243",
                    "image": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
                    "availableAtOrFrom": {
                        "@type": "Place",
                        "address": {
                            "@type": "PostalAddress",
                            "addressLocality": "Hartford County",
                            "addressRegion": " CT"
                        }
                    }
                }
            },
            "@type": "ListItem",
            "position": 0
        },
...

最新更新