Python BS4 find_all将标签内的文本替换为 <!--empty-->



我正试图从https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/

当我使用find_all从特定表中的单元格中获取值时,返回的值是">--空--";而不是该单元格中的文本。

该单元格的实际html是:

<span class="h2 ng-binding ng-isolate-scope" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S">2.54%</span>

返回的结果是:

<span class="h2" code="a.reslrates.MTGF036C" high-ratio="false" resl-rate="" type="S"><!--empty--></span>

而不是2.54%的费率文本,我明白--空--结果。我明白了我是不是错过了什么?以下完整代码:

html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text
soup = BeautifulSoup(html_text, "html.parser")
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table
rows = table.tbody.find_all("tr")
for row in rows:
for rate in row.find_all("td"):
print(rate)

我感谢所有的回复!非常感谢!

使用硒。请安装必要的依赖项并执行脚本。

from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
url = 'https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/'
# html_text = requests.get("https://www.td.com/ca/en/personal-banking/products/mortgages/mortgage-rates/").text
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
# Get the table
table = soup.find("div", class_="td-rates-table rates-bg-row1").table
rows = table.tbody.find_all("tr")
for row in rows:
for rate in row.find_all("td"):
print(rate.text)

最新更新