我刚刚开始了一个Python Web课程,我试图使用BeautifulSoup来解析HTML数据,并且遇到了这个错误。我研究了,但找不到任何精确的解决方案。因此,这是代码:
import requests
from bs4 import BeautifulSoup
request = requests.get("http://www.johnlewis.com/toms-berkley-slipper-grey/p3061099")
content = request.content
soup = BeautifulSoup(content, 'html.parser')
element = soup.find(" span", {"itemprop ": "price ", "class": "now-price"})
string_price = (element.text.strip())
print(int(string_price))
# <span itemprop="price" class="now-price"> £40.00 </span>
这是我面临的错误:
C:UsersIngeniousAmbivertvenvScriptspython.exe
C:/Users/IngeniousAmbivert/PycharmProjects/FullStack/price-eg/src/app.py
Traceback (most recent call last):
File "C:/Users/IngeniousAmbivert/PycharmProjects/FullStack/price-eg/src/app.py", line 8, in <module>
string_price = (element.text.strip())
AttributeError: 'NoneType' object has no attribute 'text'
Process finished with exit code 1
任何帮助将不胜感激
问题是您在标签名称,属性名称和属性值的 extra space字符,替换:
element = soup.find(" span", {"itemprop ": "price ", "class": "now-price"})
with:
element = soup.find("span", {"itemprop": "price", "class": "now-price"})
之后,转换字符串时还有两件事要修复:
- 从左侧剥离
£
字符 - 使用
float()
代替int()
固定版本:
element = soup.find("span", {"itemprop": "price", "class": "now-price"})
string_price = (element.get_text(strip=True).lstrip("£"))
print(float(string_price))
您会看到40.00
打印。
您也可以使用CSS选择器来尝试这样的尝试:
import requests
from bs4 import BeautifulSoup
request = requests.get("http://www.johnlewis.com/toms-berkley-slipper-grey/p3061099")
content = request.content
soup = BeautifulSoup(content, 'html.parser')
# print soup
element = soup.select("div p.price span.now-price")[0]
print element
string_price = (element.text.strip())
print(int(float(string_price[1:])))
输出:
<span class="now-price" itemprop="price">
£40.00
</span>
40