Beautifulsoup:只有第一行与其他行在相同的tr类中



我正在抓取这个网站:https://www.resultados-futbol.com/premier/grupo1/jornada1

我试图获得每个匹配的一个特定行的信息(只有第一个)。它们是每款游戏中的事件(主要是目标)。这是web的结构:

每个tr.vevent都是一个匹配,并且每个匹配包含在tr.league-match-events

我只需要在每个游戏中抓取第一个事件的信息。如果我的目标是"1-0",我就需要捕捉到这一点。或";0-1"还有比分的那一分钟。这是我尝试做的代码,但它需要所有的事件:

for row in soup.select('tr.league-match-events'):
minute = row.select_one('.lme-minute').get_text()
gol = row.select_one('.url')
minutos= []

minutos.append({
'minutos':minute,
'goles':gol
})

这是我在minutos列表中得到的内容:

[{'minutos': "23'", 'goles': <span class="url">1-0</span>}, {'minutos': "38'", 'goles': <span class="url">2-0</span>}, {'minutos': "44'", 'goles': <span class="url">3-0</span>}, {'minutos': "72'", 'goles': <span class="url">4-0</span>}, {'minutos': "78'", 'goles': <span class="url">0-1</span>}, {'minutos': "8'", 'goles': <span class="url">1-0</span>}, {'minutos': "28'", 'goles': <span class="url">2-0</span>}, {'minutos': "36'", 'goles': None}, {'minutos': "41'", 'goles': <span class="url">4-0</span>}, {'minutos': "54'", 'goles': <span class="url">4-1</span>}, {'minutos': "60'", 'goles': <span class="url">5-1</span>}, {'minutos': "71'", 'goles': <span class="url">6-1</span>}, {'minutos': "73'", 'goles': <span class="url">7-1</span>}, {'minutos': "76'", 'goles': <span class="url">8-1</span>}, {'minutos': "90'", 'goles': <span class="url">9-1</span>}, {'minutos': "64'", 'goles': <span class="url">1-0</span>}, {'minutos': "72'", 'goles': <span class="url">2-0</span>}, {'minutos': "29'", 'goles': <span class="url">1-0</span>}, {'minutos': "34'", 'goles': <span class="url">2-0</span>}, {'minutos': "48'", 'goles': <span class="url">3-0</span>}, {'minutos': "68'", 'goles': <span class="url">4-0</span>}, {'minutos': "90'", 'goles': <span class="url">4-1</span>}, {'minutos': "25'", 'goles': <span class="url">1-0</span>}, {'minutos': "64'", 'goles': <span class="url">1-1</span>}, {'minutos': "66'", 'goles': <span class="url">2-1</span>}, {'minutos': "71'", 'goles': <span class="url">3-1</span>}, {'minutos': "94'", 'goles': <span class="url">1-0</span>}, {'minutos': "9'", 'goles': <span class="url">0-1</span>}, {'minutos': "39'", 'goles': <span class="url">0-2</span>}, {'minutos': "61'", 'goles': <span class="url">1-2</span>}, {'minutos': "68'", 'goles': <span class="url">2-2</span>}]

它们都是目标,除此之外,它还包括每个目标中的。我不能用get_text()删除它,因为它显示了一个错误。

我想要的结果是:

[{'minutos': "23'", 'goles': 1-0}, {'minutos': "78'", 'goles': 0-1}, {'minutos': "8'", 'goles': 1-0}, {'minutos': "64'", 'goles': 1-0</span}, {'minutos': "29'", 'goles': 1-0}, {'minutos': "25'", 'goles': 1-0}, {'minutos': "94'", 'goles': 1-0</span}, {'minutos': "9'", 'goles': 0-1}]

提前感谢您的帮助。

不确定如何获得完整的list,因为在您的示例中,它正在覆盖自己,可能是打字错误。

如何在比赛中只获得第一个进球?

获得目标的一个非常简单的方法是检查结果中是否有0-11-0,然后只将它们附加到list:

if '0-1' in gol or '1-0' in gol:
minutos.append({
'minutos':minute,
'goles':gol
})

另一个是selecttr的下一个兄弟与匹配数据,这可能是你正在质疑的:

soup.select('tr.vevent + tr.league-match-events')

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome(executable_path=r'C:Program FilesChromeDriverchromedriver.exe')
url = 'https://www.resultados-futbol.com/premier/grupo1/jornada1'
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'html.parser') # I have to use Selenium previously cos I have to expand some buttons in the web
minutos=[]
for row in soup.select('tr.vevent + tr.league-match-events'):
minute = row.select_one('.lme-minute').get_text()
gol = row.select_one('.url').get_text()

minutos.append({
'minutos':minute,
'goles':gol
})
minutos

[{'minutos': "9'", 'goles': '0-1'},
{'minutos': "13'", 'goles': '1-0'},
{'minutos': "4'", 'goles': '1-0'},
{'minutos': "56'", 'goles': '0-1'},
{'minutos': "57'", 'goles': '0-1'},
{'minutos': "55'", 'goles': '0-1'},
{'minutos': "3'", 'goles': '0-1'},
{'minutos': "23'", 'goles': '0-1'},
{'minutos': "71'", 'goles': '0-1'},
{'minutos': "79'", 'goles': '1-0'}]

最新更新