我尝试在JobStreet中报废数据。如果数据'NoneType'对象没有属性'text',如何跳过数据



当我试图在jobstreet中抓取数据时,我被卡住了我发现数据"NoneType"对象没有属性"text",这使我出现了一些错误如何修复。这是代码

from bs4 import BeautifulSoup as sp import requests import openpyxl excel = openpyxl.Workbook()
#print(excel.sheetnames) sheet = excel.active sheet.title = 'jobstreet scrap' sheet.append(['Job ','link job','company','location','published','post'])

keyword = 'sales'
for pages in range(1,10):
url = 'https://www.jobstreet.co.id/en/job-search/{}-jobs/{}'.format(keyword,pages)
page = requests.get(url)
soup = sp(page.content,'html.parser')
job = soup.find('div',class_="sx2jih0 zcydq8bm").find('div',class_='sx2jih0')
for jobs in job :
link_job = jobs.find('h1',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvca').a.get('href')
pekerjaan = jobs.find('h1',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvca').a.text
company =  jobs.find('span',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc1 _18qlyvca').text
location =  jobs.find('span',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvc7').span.text
published = jobs.find('time',class_='sx2jih0 zcydq84u').get('datetime')
pub = jobs.find('time',class_='sx2jih0 zcydq84u').text
sheet.append([pekerjaan,link_job,company,location,published,pub])
print('=====================',pages)
print("Job : ",pekerjaan,'n',"link Job : ",link_job,'n',"company : ",company,'n',"location",location,'n',"published : ",pub,'n',"published at : ",published)
excel.save('jobstreetq.xlsx')

这就是错误错误

这是因为元素上没有带有该类的span标记:

只需添加一些逻辑或添加try/except,当找不到元素时就会跳过。看起来您可以找到一种模式,即company类以'sx2jih0 zcydq84u'开头。所以搜索它,而不是硬编码的'sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc1 _18qlyvca'

import re
keyword = 'sales'
for pages in range(1,10):
url = 'https://www.jobstreet.co.id/en/job-search/{}-jobs/{}'.format(keyword,pages)
page = requests.get(url)
soup = sp(page.content,'html.parser')
job = soup.find('div',class_="sx2jih0 zcydq8bm").find('div',class_='sx2jih0')
for jobs in job :
link_job = jobs.find('h1',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvca').a.get('href')
pekerjaan = jobs.find('h1',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvca').a.text
company =  jobs.find('span',class_=re.compile("^sx2jih0 zcydq84u")).text

location =  jobs.find('span',class_='sx2jih0 zcydq84u _18qlyvc0 _18qlyvc1x _18qlyvc3 _18qlyvc7').span.text
published = jobs.find('time',class_='sx2jih0 zcydq84u').get('datetime')
pub = jobs.find('time',class_='sx2jih0 zcydq84u').text
sheet.append([pekerjaan,link_job,company,location,published,pub])
print('=====================',pages)
print("Job : ",pekerjaan,'n',"link Job : ",link_job,'n',"company : ",company,'n',"location",location,'n',"published : ",pub,'n',"published at : ",published)

最新更新