使用Beautiful Soup(Python)从表中提取特定值

我四处查看了Stackoverflow，大多数指南似乎都非常具体地从表中提取所有数据。然而，我只需要提取一个，而且似乎无法从表中提取特定的值。

刮板链接：

https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919

我希望提取"；风格"值。

代码：

import bs4
styleData=[]
pagedata = requests.get("https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919") 
cleanpagedata = bs4.BeautifulSoup(pagedata.text, 'html.parser') 
table=cleanbyAddPD.find('div',{'id':'MainContent_ctl01_panView'})
style=table.findall('tr')[3]
style=style.findall('td')[1].text
print(style)
styleData.append(style)

可能您滥用了find_all函数，请尝试以下解决方案：

style=table.find_all('tr')[3]
style=style.find_all('td')[1].text
print(style)

它会给你预期的输出

您可以使用CSS选择器：

#MainContent_ctl01_grdCns tr:nth-of-type(4) td:nth-of-type(2)

其将选择"MainContent_ctl01_grdCns"id、第四<tr>、第二<td>。

要使用CSS Selector，请使用.select()方法而不是find_all()。或者用select_one()代替find()。

import requests
from bs4 import BeautifulSoup

URL = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
print(
soup.select_one(
"#MainContent_ctl01_grdCns tr:nth-of-type(4)  td:nth-of-type(2)"
).text
)

输出：

Townhouse End

还可以执行以下操作：

import bs4 
import requests
style_data = []
url = "https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919"
soup = bs4.BeautifulSoup(requests.get(url).content, 'html.parser')
# select the first `td` tag whose text contains the substring `Style:`.
row = soup.select_one('td:-soup-contains("Style:")')
if row:
# if that row was found get its sibling which should be that vlue you want
home_style_tag = row.next_sibling
style_data.append(home_style_tag.text)

几张钞票

这使用CSS选择器，而不是find方法。有关更多详细信息，请参阅SoupSieve文档
select_one依赖于这样一个事实，即表总是以某种方式排序，如果不是这样，请使用select并迭代结果以找到文本正是'Style:'的bs4.Tag，然后获取其下一个同级

使用select:

rows = soup.select('td:-soup-contains("Style:")')
row = [r for r in rows if r.text == 'Style:']
home_style_text = row.text

您可以在td上使用:contains来获得具有innerText"风格"则使用CCD_ 19类型选择器的相邻兄弟组合子来获得相邻CCD_。

import bs4, requests
pagedata = requests.get("https://gis.vgsi.com/portsmouthnh/Parcel.aspx?pid=38919") 
cleanpagedata = bs4.BeautifulSoup(pagedata.text, 'html.parser') 
print(cleanpagedata.select_one('td:contains("Style") + td').text)

相关内容

最新更新

热门标签：