如何显示由2个不同循环的抓取URL产生的串联值



我很高兴这个解决方案很简单,但我找不到它:我制作了循环中的循环来废弃页面的所有url。

对于#1:产品属性

  • 我无法在同一行显示以下循环产生的所有值:

attribzF+valuezZF如果我打印(attribzF,valuezZF(:我只会得到循环的第一个值(而我应该有5个(

对于#2:产品描述:

如何在包含其中5个的中提取特定

?我可以从所有

中获取所有文本,但不能获取单个文本。你如何区分它们?

非常感谢伙伴们的帮助!!

import requests
from bs4 import BeautifulSoup       

url='http://books.toscrape.com/catalogue/category/books/mystery_3/index.html'
u = requests.get(url)
soup = BeautifulSoup(u.content, 'html.parser')
for link in soup.findAll('article', {"class" : 'product_pod'}) :
links = link.findAll('a')

for lien in links :
lienFinale = lien.get('href')
lienComp = "http://books.toscrape.com/catalogue/" + lienFinale.strip('../../../')
lienComp1 = lienComp.split(',')
for l in lienComp1 :
r=requests.get(l)
soup2 = BeautifulSoup(r.content,'html.parser')

#1. PRODUCT ATTRIBUTES :

soupAp = soup2.findAll('table', class_='table table-striped')
for attrib in soupAp :
attribF = attrib.findAll('th')

for attribz in attribF : 
attribzF = attribz.string


for valuez in soupAp :
valuezF = valuez.findAll('td')

for valuezZ in valuezF :
valuezZF = valuezZ.string        

print(attribzF,valuezZF) 

#2. DESCRIPTION : 
descrip = soup2.find('article', class_="product_page") 
descripFinal = descrip.findAll('p')
for data in descripFinal :
print(data.get_text())
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from pprint import pp

def get_soup(content):
return BeautifulSoup(content, 'lxml')

def main(url):
with requests.Session() as req:
r = req.get(url)
soup = get_soup(r.content).select('ol.row h3 > a')
links = (urljoin(url, i['href']) for i in soup)
for link in links:
r = req.get(link)
soup = get_soup(r.content)
goal = soup.select_one('.table-striped').stripped_strings
data = dict(zip(goal, goal))
data['Description'] = soup.select_one(
'#product_description + p').get_text(strip=True)
pp(data)
break

main('http://books.toscrape.com/catalogue/category/books/mystery_3/index.html')

不需要所有这些循环,尝试更改策略选择元素可以检查css selectors并集中处理。

要获得产品信息,您可以使用dict comprehension,它迭代表的所有行,并基于提取文本的stripped_stringsResultSet创建键/值对:

dict((row.stripped_strings) for row in soup2.select('table tr'))

根据其上一个同级的id选择描述:

soup2.select_one('#product_description + p').get_text()

注意:在较新的代码中,避免使用旧语法findAll(),而使用find_all()-要了解更多信息,请花一分钟时间检查文档

示例

抓取的结果作为dictionarieslist存储在书中,因此您使用的结构可以很容易地迭代或转换为DataFrame、CSV。。。

import requests
from bs4 import BeautifulSoup

url='http://books.toscrape.com/catalogue/category/books/mystery_3/index.html'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
books = []
for a in soup.select('article h3 a') :
r=requests.get("http://books.toscrape.com/catalogue/" + a.get('href').strip('../../../'))
soup2 = BeautifulSoup(r.content,'html.parser')
d=dict((row.stripped_strings) for row in soup2.select('table tr'))
d['description'] = soup2.select_one('#product_description + p').get_text()
### d['title'] = soup2.h1.get_text()
### ... whatever information you want to add
books.append(d)
books

输出

[{'UPC': 'e00eb4fd7b871a48',
'Product Type': 'Books',
'Price (excl. tax)': '£47.82',
'Price (incl. tax)': '£47.82',
'Tax': '£0.00',
'Availability': 'In stock (20 available)',
'Number of reviews': '0',
'description': 'WICKED above her hipbone, GIRL across her heart Words are like a road map to reporter Camille Preaker’s troubled past. Fresh from a brief stay at a psych hospital, Camille’s first assignment from the second-rate daily paper where she works brings her reluctantly back to her hometown to cover the murders of two preteen girls. NASTY on her kneecap, BABYDOLL on her leg Since WICKED above her hipbone, GIRL across her heart Words are like a road map to reporter Camille Preaker’s troubled past. Fresh from a brief stay at a psych hospital, Camille’s first assignment from the second-rate daily paper where she works brings her reluctantly back to her hometown to cover the murders of two preteen girls. NASTY on her kneecap, BABYDOLL on her leg Since she left town eight years ago, Camille has hardly spoken to her neurotic, hypochondriac mother or to the half-sister she barely knows: a beautiful thirteen-year-old with an eerie grip on the town. Now, installed again in her family’s Victorian mansion, Camille is haunted by the childhood tragedy she has spent her whole life trying to cut from her memory. HARMFUL on her wrist, WHORE on her ankle As Camille works to uncover the truth about these violent crimes, she finds herself identifying with the young victims—a bit too strongly. Clues keep leading to dead ends, forcing Camille to unravel the psychological puzzle of her own past to get at the story. Dogged by her own demons, Camille will have to confront what happened to her years before if she wants to survive this homecoming.With its taut, crafted writing, Sharp Objects is addictive, haunting, and unforgettable. ...more'},
{'UPC': '19ed25f4641d5efd',
'Product Type': 'Books',
'Price (excl. tax)': '£19.63',
'Price (incl. tax)': '£19.63',
'Tax': '£0.00',
'Availability': 'In stock (18 available)',
'Number of reviews': '0',
'description': "In a dark, dark wood Nora hasn't seen Clare for ten years. Not since Nora walked out of school one day and never went back. There was a dark, dark houseUntil, out of the blue, an invitation to Clare’s hen do arrives. Is this a chance for Nora to finally put her past behind her?And in the dark, dark house there was a dark, dark roomBut something goes wrong. Very wrong.And i In a dark, dark wood Nora hasn't seen Clare for ten years. Not since Nora walked out of school one day and never went back. There was a dark, dark houseUntil, out of the blue, an invitation to Clare’s hen do arrives. Is this a chance for Nora to finally put her past behind her?And in the dark, dark house there was a dark, dark roomBut something goes wrong. Very wrong.And in the dark, dark room.... Some things can’t stay secret for ever. ...more"},...]

最新更新