Python 2.7 网页抓取：属性错误：'unicode'对象没有属性'find_all'

我正在尝试从此站点进行网络抓取，但无法修复此错误：

属性
错误："unicode"对象没有属性"find_all">

我正在尝试使用 unicodedata 方法从解析的字符串中删除 \xa0(当有空 p 标签时出现(。

pages = ["http://sg.startupjobs.asia/sg/job/search?w=jobs&q=data+scientist+OR+data+analyst+OR+business+analyst+OR+business+intelligence&l=Anywhere&t=any&job_page=" + str(i) for i in range(1, 12)]
job_links = []
for p in pages:
    r  = requests.get(p)
    data = r.text
    soup = BeautifulSoup(data, "lxml").text
    clean_soup = unicodedata.normalize("NFKD", soup)
    container = clean_soup.find_all('div', attrs={'id': 'yw0'})
    for text in container:
        job_names = text.find_all('span', attrs={'class': 'JobRole'})
        for name in job_names: 
            for link in name.find_all('a'):
                job_link = link.get('href')
                job_links.append("http://sg.startupjobs.asia" + str(job_link))

clean_soup的类型是Unicode而不是BeautifulSoup。

clean_soup_2 = BeautifulSoup(clean_soup)
clean_soup_2.find_all('div', attrs={'id': 'yw0'})

以下内容应该有效。改变

r  = requests.get(p)
data = r.text
soup = BeautifulSoup(data, "lxml").text
clean_soup = unicodedata.normalize("NFKD", soup)
container = clean_soup.find_all('div', attrs={'id': 'yw0'})

自

r = requests.get(p)
clean_text = unicodedata.normalize('NFKD', r.text)
soup = BeautifulSoup(clean_text, 'lxml')
container = soup.find_all('div', attrs={'id': 'yw0'})

相关内容

最新更新

热门标签：