从BeautifulSoup输出标签给出html和None



我有麻烦从BeautifulSoup.find()方法获得标签。

下面是我的代码:
url = evaluations['href']
page = requests.get(url, headers = HEADERS)
soup = BeautifulSoup(page.content, 'lxml')
evaluators = soup.find("section", class_="main-content list-content")
evaluators_list = evaluators.find("ul", class_='evaluation-list').find_all("li")
evaluators_dict = defaultdict(dict)
for evaluator in evaluators_list:
eval_list = evaluator.find('ul', class_='highlights-list')
print(eval_list.prettify())

然后输出:

<ul class="highlights-list">
<li class="eval-meta evaluator">
<b class="uppercase heading">
Evaluated By
</b>
<img alt="Andrew Ivins" height="50" src="https://s3media.247sports.com/Uploads/Assets/680/358/9358680.jpeg?fit=bounds&amp;crop=50:50,offset-y0.50&amp;width=50&amp;height=50&amp;fit=crop" title="Andrew Ivins" width="50"/>
<div class="evaluator">
<b class="text">
Andrew Ivins
</b>
<span class="uppercase">
Southeast Recruiting Analyst
</span>
</div>
</li>
<li class="eval-meta projection">
<b class="uppercase heading">
Projection
</b>
<b class="text">
First Round
</b>
</li>
<li class="eval-meta">
<b class="uppercase heading">
Comparison
</b>
<a href="https://247sports.com/Player/Charles-Woodson-76747/" target="_blank">
Charles Woodson
</a>
<span class="uppercase">
Oakland Raiders
</span>
</li>
</ul>

和错误

Traceback (most recent call last):
File "XXX", line 2, in <module>
player = Player("Travis-Hunter-46084728").player
File "XXX", line 218, in __init__
self.player = self._parse_player()
File "XXX", line 253, in _parse_player
evaluators, background, skills = self._find_scouting_report(soup)
File "XXX", line 468, in _find_scouting_report
print(eval_list.prettify())
AttributeError: 'NoneType' object has no attribute 'prettify'

正如您所看到的,它确实找到了标记并以一种漂亮的方式输出它,但也输出一个None。有什么办法可以解决这个问题?提前谢谢你。我使用的链接是:https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/

编辑:我使用硒认为这可能是一个JS问题,但也没有解决。

import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0'
}

def get_soup(content):
return BeautifulSoup(content, 'lxml')

def main(url):
with requests.Session() as req:
req.headers.update(headers)
r = req.get(url)
soup = get_soup(r.content)
goal = [list(x.stripped_strings) for x in soup.select(
'.main-content.list-content > .evaluation-list > li > .highlights-list')]
for i in goal:
print(i[1:3] + i[-2:])

if __name__ == "__main__":
main('https://247sports.com/PlayerInstitution/Travis-Hunter-at-Collins-Hill-236028/PlayerInstitutionEvaluations/')

输出:

['Andrew Ivins', 'Southeast Recruiting Analyst', 'Charles Woodson', 'Oakland Raiders']
['Andrew Ivins', 'Southeast Recruiting Analyst', 'Xavier Rhodes', 'Minnesota Vikings']
['Charles Power', 'National writer', 'Marcus Peters', 'Baltimore Ravens']

最新更新