如何使用美丽的汤获取儿童标签描述的文本



我正在使用美丽的汤从中抓取一些数据foodily.com

在上面的页面上有一个div类'ings',我想在它的p标签中获得数据,我已经写了下面的代码:

ingredients = soup.find('div', {"class": "ings"}).findChildren('p')

它提供了我的成分列表,但与p标签。

div元素中含有class="ings"p元素调用get_text()

完整工作代码:

from bs4 import BeautifulSoup
import requests
with requests.Session() as session:
    session.headers.update({"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"})
    response = session.get("http://www.foodily.com/r/0y1ygzt3zf-perfect-vanilla-cupcakes-by-annie-s")
    soup = BeautifulSoup(response.content, "html.parser")
    ingredients = [ingredient.get_text() for ingredient in soup.select('div.ings p')]
    print(ingredients)

打印:

[
    u'For the cupcakes:', 
    u'1 stick (113g) butter/marg*', 
    u'1 cup caster sugar', u'2 eggs', 
    ...
    u'1 tbsp vanilla extract', 
    u'2-3tbsp milk', 
    u'Sprinkles to decorate, optional'
]

请注意,我也改进了你的定位器一点,并切换到div.ings p CSS选择器

另一种说法:

import requests
from bs4 import BeautifulSoup as bs

url = "http://www.foodily.com/r/0y1ygzt3zf-perfect-vanilla-cupcakes-by-annie-s"
source = requests.get(url)
text_new = source.text
soup = bs(text_new, "html.parser")
ingredients  = soup.findAll('div', {"class": "ings"})
for a in ingredients :
    print (a.text)

它将打印:

For the cupcakes:
1 stick (113g) butter/marg*
1 cup caster sugar
2 eggs
1 tbsp vanilla extract
1 and 1/2 cups plain flour
2 tsp baking powder
1/2 cup milk (I use Skim)
For the frosting:
2 sticks (226g) unsalted butter, at room temp
2 and 1/2 cups icing sugar, sifted
1 tbsp vanilla extract
2-3tbsp milk
Sprinkles to decorate, optional

如果您已经有p标签列表,请使用get_text()。这将只返回它们的文本:

ingredient_list = p.get_text() for p in ingredients

结果数组如下:

ingredient_list = [
   'For the cupcakes:', '1 stick (113g) butter/marg*', 
   '1 cup caster sugar','2 eggs', ...
]

最新更新