如何将字典刮入链接



我正在培训我的学校用BS4刮擦,并希望从链接锚点中提取字典的内容。如何提取字典ctdata的内容?

以下是细节:

链接:a ct="result_offer_content"

ctdata = {"ad_id_solr": "1a7d243c3610c62012159b7c9d4e900382bbe446", 
  "ad_id_mongo": "", "ad_segment_id": 1723, "ad_partner": "wizbii.com_premium",  
  "ad_sector": "Ingu00e9nierie", "ad_subsector": "", 
  "ad_jobtitle": "Ingu00e9nieur du00e9veloppeur", "ad_company": "SII",
  "ad_type": "exact", "ad_position": 1, "ad_locality": "Bordeaux"}

我尝试了

for offers in soup.find_all("a", {'ct':'result_offer_content'}):
   offre = offers.find('ctdata')
   print(jobtitle)

但是输出是'无...。'

,由于它处于JSON结构中,它将被读入JSON。尽管您没有提供完整的代码,但我对jobtitle的参考是有点混乱。而且,由于完整的代码不在这里,我只能提供一个通用的解决方案,因此您需要适应,但这就是您在以下内容中阅读的方式:

import json
json_str = '{"ad_id_solr":"1a7d243c3610c62012159b7c9d4e900382bbe446","ad_id_mongo":"","ad_segment_id":1723,"ad_partner":"wizbii.com_premium","ad_sector":"Ingu00e9nierie","ad_subsector":"","ad_jobtitle":"Ingu00e9nieur du00e9veloppeur","ad_company":"SII","ad_type":"exact","ad_position":1,"ad_locality":"Bordeaux"}'
json_dict = json.loads(json_str)

附加

现在您已经提供了URL,我就可以看到这个问题。您要使用 .get()而不是 .find作为属性 'ctdata'

import json
import requests
import bs4

req = requests.get("https://www.jobijoba.com/fr/query/?what=data&where=Bordeaux&where_type=city%22")
soup = bs4.BeautifulSoup(req.text, 'html.parser')
offers = soup.find_all("a", {'ct':'result_offer_content'})
for offers in soup.find_all("a", {'ct':'result_offer_content'}):
    offre = offers.get('ctdata')
    json_dict = json.loads(offre)
    jobtitle = json_dict['ad_jobtitle']
    print (jobtitle)

输出:

Ingénieur développeur
Ingénieur développeur
Data Scientist
Data Scientist
Développeur big data

Data Scientist
Data Scientist
Ingénieur développeur
Data Scientist
Data Scientist
Data Scientist

Ingénieur décisionnel
Architecte
Data Scientist
Data Scientist
Data Scientist
Développeur informatique

某些标签没有作业标题,因此您可以从本质上跳过这些标签/不检查作业标题是否为空白:

import json
import requests
import bs4

req = requests.get("https://www.jobijoba.com/fr/query/?what=data&where=Bordeaux&where_type=city%22")
soup = bs4.BeautifulSoup(req.text, 'html.parser')
offers = soup.find_all("a", {'ct':'result_offer_content'})
for offers in soup.find_all("a", {'ct':'result_offer_content'}):
    offre = offers.get('ctdata')
    json_dict = json.loads(offre)
    jobtitle = json_dict['ad_jobtitle']
    if jobtitle != '':
        print (jobtitle)

输出:

Ingénieur développeur
Ingénieur développeur
Data Scientist
Data Scientist
Développeur big data
Data Scientist
Data Scientist
Ingénieur développeur
Data Scientist
Data Scientist
Data Scientist
Ingénieur décisionnel
Architecte
Data Scientist
Data Scientist
Data Scientist
Développeur informatique