我正在尝试获得游戏的标题,但有了标题,我也得到了span文本
这是我的代码
import time
import requests,pandas
from bs4 import BeautifulSoup
r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1", headers=
{'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101
Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = bs4.find_all("h3",{"class":"indent"})
print(all)
输出
[<h3 class="indent">
<div><span>1</span></div>
Fran Bow </h3>, <h3 class="indent">
<div><span>2</span></div>
Bendy and the Ink Machine </h3>, <h3 class="indent">
<div><span>3</span></div>
Five Nights at Freddy's </h3>, <h3 class="indent">
<div><span>4</span></div>
Sanitarium </h3>, <h3 class="indent">
<div><span>5</span></div>
OXENFREE </h3>, <h3 class="indent">
<div><span>6</span></div>
Thimbleweed Park </h3>, <h3 class="indent">
<div><span>7</span></div>
Samsara Room </h3>, <h3 class="indent">
我也尝试过这个代码,但不起作用
#all = all.find_all("h3")[0].text
如何修复
因为你想要得到的文本总是<h3>
中的最后一个元素,你可以通过<h3>
的contents
来提取它。
element.contents[-1]
要使文本在结果集上迭代:
for x in bs4.find_all("h3",{"class":"indent"}):
print(x.contents[-1].get_text(strip=True))
示例
import requests,pandas
from bs4 import BeautifulSoup
r = requests.get("https://www.pocketgamer.com/android/best-horror-games/?page=1",
headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
c = r.content
bs4 = BeautifulSoup(c,"html.parser")
all = [x.contents[-1].get_text(strip=True) for x in bs4.find_all("h3",{"class":"indent"})]
print(all)
输出
['Fran Bow', 'Bendy and the Ink Machine', "Five Nights at Freddy's", 'Sanitarium', 'OXENFREE', 'Thimbleweed Park', 'Samsara Room', 'Into the Dead 2', 'Slayaway Camp', 'Eyes - the horror game', 'Slendrina:The Cellar', 'Hello Neighbor', 'Alien: Blackout', 'Rest in Pieces', 'Friday the 13th: Killer Puzzle', 'I Am Innocent', 'Detention', 'Limbo', 'Knock-Knock', 'Sara Is Missing', 'Death Park: Scary Horror Clown', 'Horror Hospital 2', 'Horrorfield - Multiplayer Survival Horror Game', 'Erich Sann: Horror in the scary Academy', 'The Innsmouth Case']