无法抓取网站标题-Python Bs4



我正在尝试获得游戏的标题,但有了标题,我也得到了span文本

这是我的代码

import time
import requests,pandas
from bs4 import BeautifulSoup

r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1", headers=        
{'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 
Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = bs4.find_all("h3",{"class":"indent"}) 
print(all)

输出

[<h3 class="indent">
<div><span>1</span></div>
Fran Bow </h3>, <h3 class="indent">
<div><span>2</span></div>
Bendy and the Ink Machine </h3>, <h3 class="indent">
<div><span>3</span></div>
Five Nights at Freddy's </h3>, <h3 class="indent">
<div><span>4</span></div>
Sanitarium </h3>, <h3 class="indent">
<div><span>5</span></div>
OXENFREE </h3>, <h3 class="indent">
<div><span>6</span></div>
Thimbleweed Park </h3>, <h3 class="indent">
<div><span>7</span></div>
Samsara Room </h3>, <h3 class="indent">

我也尝试过这个代码,但不起作用

#all = all.find_all("h3")[0].text

如何修复

因为你想要得到的文本总是<h3>中的最后一个元素,你可以通过<h3>contents来提取它。

element.contents[-1]

要使文本在结果集上迭代:

for x in bs4.find_all("h3",{"class":"indent"}):
print(x.contents[-1].get_text(strip=True))

示例

import requests,pandas
from bs4 import BeautifulSoup

r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1", 
headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = [x.contents[-1].get_text(strip=True) for x in bs4.find_all("h3",{"class":"indent"})]
print(all)

输出

['Fran Bow', 'Bendy and the Ink Machine', "Five Nights at Freddy's", 'Sanitarium', 'OXENFREE', 'Thimbleweed Park', 'Samsara Room', 'Into the Dead 2', 'Slayaway Camp', 'Eyes - the horror game', 'Slendrina:The Cellar', 'Hello Neighbor', 'Alien: Blackout', 'Rest in Pieces', 'Friday the 13th: Killer Puzzle', 'I Am Innocent', 'Detention', 'Limbo', 'Knock-Knock', 'Sara Is Missing', 'Death Park: Scary Horror Clown', 'Horror Hospital 2', 'Horrorfield - Multiplayer Survival Horror Game', 'Erich Sann: Horror in the scary Academy', 'The Innsmouth Case']

最新更新