抓取与'tr class'标签关联的'p'标签值

我正在使用Anaconda和BeautifulSoup从网站上抓取数据。

import requests
resp = requests.get('https://www.url.com')
Weathertest = resp.text
from bs4 import BeautifulSoup
soup = BeautifulSoup(Weathertest,'lxml') 
mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('p',class_='weatherhistory_results_datavalue temp_mn')

我想做的是拉出特定日期的最低温度。这是页面的 html：

<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">&#176;F</span></p></td></tr>

在我尝试上述方法并得到 [] 的结果后，我意识到天气历史类不是 p 类，因此上述内容不起作用。相反，我尝试了：

mintemp = BeautifulSoup(Weathertest, 'lxml')
mintemp.find_all('tr',class_='weatherhistory_results_datavalue temp_mn')

我得到的结果是上面的整个 html 字符串(从 tr 类到/tr)。我尝试找到如何从 tr 类中提取 p 值，但我什么也没想出来。我对这一切相当陌生，所以我确定这是一件简单的事情，我还不知道。

或者也许我需要一个复合语句，例如"找到上面的所有 tr 类，然后给我 p 值"，但我不确定如何编码。

试试这个：

>>>data = """<tr class="weatherhistory_results_datavalue temp_mn"><th><h3>Minimum Temperature</h3></th><td><p><span class="value">47.3</span> <span class="units">&#176;F</span></p></td></tr>"""
>>> from bs4 import BeautifulSoup
>>> soap = BeautifulSoup(data,"lxml")
>>> temp = soap.find_all("tr",{"class":"weatherhistory_results_datavalue temp_mn"})
>>> for i in temp:
a = i.find("span",{"class": "value"})
print(a.text)

相关内容

最新更新

热门标签：