使用Python Beautifulsoup的网络抓取通过HTML标签循环



我正在使用Python Beautifulsoup从以下URL 'https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm'进行网络抓取。我想从URL中抓取球员的名字,他们的伤病和受伤的一周。我可以从第 1 周抓取显示以下结果的信息:

[['Danny Amendola'], 'Questionable: hamstring', 'week_1']
[['Armond Armstead'], 'Out: infection', 'week_1']
[['Kyle Arrington'], 'NA', 'week_1']
[['Brandon Bolden'], 'Questionable: knee', 'week_1']
... and so on for all the week 1 injuries.

但是一旦显示所有第 1 周的伤害,它就会停止。

我希望结果能够直接运行到第 2 周、第 3 周、第 4 周......等。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("tbody")
head = page_soup.find("thead")

player = containers.find_all("tr")
for tr in player:
    th = tr.find_all("th")
    name = [i.text for i in th]
    week = tr.td["data-stat"]
    try:
        injury = tr.td["data-tip"]
        print([name, injury, week])
    except KeyError:
        injury = "NA"
        print([name, injury, week])

我正在寻找的结果是代码打印球员姓名、他们的受伤情况和受伤周数,这些周出现在 URL 的表格中显示的所有周数。例如,一旦打印了所有第 1 周的伤害,我希望它显示所有第 2 周和第 3 周的伤害,依此类推。所以它看起来像这样:

[['Adrian Wilson'], 'Injured Reserve: hamstring', 'week_1']
[['Tavon Wilson'], 'NA', 'week_1']
[['Markus Zusevics'], 'Injured Reserve: undisclosed', 'week_1']
[['Danny Amendola'], 'Questionable: groin', 'week_2']
...

您只是遍历数据提示的第一个实例,这应该有效:

player = containers.find_all("tr")
for tr in player:
   th = tr.find_all("th")
   name = [i.text for i in th]
   for td in tr.findAll('td'): 
       week = td["data-stat"]
       try:
           injury = td["data-tip"]
           print([name, injury, week])
       except KeyError:
           injury = "NA"
           print([name, injury, week])

代码:

import re
import requests
from bs4 import BeautifulSoup as soup
html = requests.get('https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm').text
overall = []
page_soup = soup(html, 'html.parser')
containers = page_soup.find('tbody')
players = containers.find_all('tr')
for player in players:
    th = player.find_all('th')
    name = [i.text for i in th]
    tds = player.find_all('td', {'class': re.compile('^center poptip')})
    weeklyInjuries = ', '.join([', '.join(i) for i in [list(a) for a in zip([i['data-tip'] for i in tds], [i['data-stat'] for i in tds])]])
    if len(weeklyInjuries) == 0:
        weeklyInjuries = 'N/A'
    print([name, weeklyInjuries])

输出:

[['Danny Amendola'], 'Questionable: hamstring, week_1, Questionable: groin, week_2, Doubtful: groin, week_3, Questionable: groin, week_4, Questionable: groin, week_5, Probable: groin, week_6, Out: concussion, week_7, Questionable: concussion, week_8, Questionable: groin, week_9, Probable: groin, week_11, Probable: groin, week_12, Probable: groin, week_13, Probable: groin, week_14, Probable: groin, week_15, Questionable: groin, week_16, Probable: groin, week_17']
[['Armond Armstead'], 'Out: infection, week_1, Out: infection, week_2, Out: infection, week_3, Out: infection, week_4, Out: infection, week_5, Out: infection, week_6, Out: infection, week_7, Out: infection, week_8, Out: infection, week_9, Out: infection, week_11, Out: infection, week_12, Out: infection, week_13, Out: infection, week_14, Out: infection, week_15, Out: infection, week_16, Out: infection, week_17, Out: infection, week_19, Out: infection, week_20']
[['Kyle Arrington'], 'Questionable: groin, week_4, Questionable: groin, week_5, Probable: groin, week_6, Probable: groin, week_7, Probable: groin, week_8, Questionable: groin, week_9, Questionable: groin, week_11, Probable: groin, week_12, Questionable: groin, week_13, Questionable: groin, week_14, Questionable: groin, week_15, Questionable: groin, week_16, Questionable: groin, week_17']
[['Brandon Bolden'], 'Questionable: knee, week_1, Questionable: knee, week_2, Questionable: knee, week_3, Questionable: knee, week_4, Questionable: knee, week_5, Probable: knee, week_6, Questionable: knee, week_7, Questionable: knee, week_8, Questionable: knee, week_9, Questionable: knee, week_11']
[['Josh Boyce'], 'Doubtful: hip, week_16, Questionable: hip, week_17']
[['Tom Brady'], 'Probable: right shoulder, week_8, Probable: right shoulder, week_9, Probable: right shoulder, week_11, Probable: right shoulder, week_12, Probable: shoulder, week_13, Probable: right shoulder, week_14, Questionable: shoulder, week_15, Probable: right shoulder, week_16, Probable: right shoulder, week_17']
[['Marcus Cannon'], 'Questionable: shoulder, week_7, Questionable: shoulder, week_8, Questionable: shoulder, week_9, Questionable: ankle, week_13, Questionable: ankle, week_14, Questionable: ankle, week_15, Questionable: ankle, week_16, Questionable: ankle, week_17']
[['Marquice Cole'], 'Probable: hamstring, week_2, Questionable: hamstring, week_4, Questionable: hamstring, week_5, Questionable: leg, week_13, Questionable: shin, week_14, Questionable: shin, week_15']
[['Austin Collie'], 'N/A']
[['Dan Connolly'], 'Questionable: finger, week_3, Questionable: head, week_7']
[['Alfonzo Dennard'], 'Probable: ankle, week_2, Questionable: leg, week_11, Questionable: knee, week_13, Questionable: knee, week_14, Questionable: knee/shoulder, week_15, Questionable: knee/shoulder, week_16, Questionable: knee/shoulder, week_17']
[['Aaron Dobson'], 'Questionable: hamstring, week_1, Questionable: hamstring, week_2, Doubtful: shoulder, week_4, Questionable: neck, week_5, Questionable: neck, week_6, Questionable: undisclosed, week_13, Questionable: foot, week_14, Questionable: foot, week_15, Questionable: foot, week_16, Questionable: foot, week_17']
[['Nate Ebner'], 'Questionable: ankle, week_1, Questionable: ankle, week_2, Questionable: ankle, week_3, Questionable: ankle, week_4, Questionable: ankle, week_5, Probable: ankle, week_6']
[['Julian Edelman'], 'Questionable: thigh, week_7, Questionable: thigh, week_8, Probable: thigh, week_9']
[['Dane Fletcher'], 'Questionable: groin, week_16, Questionable: groin, week_17']
[['Tyronne Green'], 'Injured Reserve: undisclosed, week_1, Injured Reserve: undisclosed, week_2, Injured Reserve: undisclosed, week_3, Injured Reserve: undisclosed, week_4, Injured Reserve: undisclosed, week_5, Injured Reserve: undisclosed, week_6, Injured Reserve: undisclosed, week_7, Injured Reserve: undisclosed, week_8, Injured Reserve: undisclosed, week_9, Injured Reserve: undisclosed, week_11, Injured Reserve: undisclosed, week_12, Injured Reserve: undisclosed, week_13, Injured Reserve: undisclosed, week_14, Injured Reserve: undisclosed, week_15, Injured Reserve: undisclosed, week_16, Injured Reserve: undisclosed, week_17, Injured Reserve: undisclosed, week_19, Injured Reserve: undisclosed, week_20']
[['Steve Gregory'], 'Out: thumb, week_11, Questionable: finger, week_12, Questionable: finger, week_13, Questionable: finger, week_14, Questionable: finger, week_15, Questionable: finger, week_16, Questionable: knee/finger, week_17']
[['Cory Grissom'], 'Injured Reserve: knee, week_1, Injured Reserve: knee, week_2, Injured Reserve: knee, week_3, Injured Reserve: knee, week_4, Injured Reserve: knee, week_5, Injured Reserve: knee, week_6, Injured Reserve: knee, week_7, Injured Reserve: knee, week_8, Injured Reserve: knee, week_9, Injured Reserve: knee, week_11, Injured Reserve: knee, week_12, Injured Reserve: knee, week_13, Injured Reserve: knee, week_14, Injured Reserve: knee, week_15, Injured Reserve: knee, week_16, Injured Reserve: knee, week_17, Injured Reserve: knee, week_19, Injured Reserve: knee, week_20']
[['Rob Gronkowski'], 'Doubtful: arm/back, week_1, Questionable: arm/back, week_2, Doubtful: arm/back, week_3, Questionable: arm/back, week_4, Doubtful: arm/back, week_5, Probable: arm/back, week_6, Questionable: arm/back, week_7, Probable: back/forearm, week_8, Probable: back/forearm/hamstring, week_9, Probable: back/forearm/hamstring, week_11, Probable: back/forearm/hamstring, week_12, Probable: hamstring, week_13, Questionable: ankle, week_14, Injured Reserve: torn right ACL/MCL, week_15, Injured Reserve: torn right ACL/MCL, week_16, Injured Reserve: torn right ACL/MCL, week_17, Injured Reserve: torn right ACL/MCL, week_19, Injured Reserve: torn right ACL/MCL, week_20']
[['Duron Harmon'], 'Questionable: hamstring, week_1, Questionable: hamstring, week_2']
[['Mark Harrison'], 'Out: foot, week_1, Out: foot, week_2, Out: foot, week_3, Out: foot, week_4, Out: foot, week_5, Out: foot, week_6, Out: foot, week_7, Out: foot, week_8, Out: foot, week_9, Out: foot, week_11, Out: foot, week_12, Out: foot, week_13, Out: foot, week_14, Out: foot, week_15, Out: foot, week_16, Out: foot, week_17, Out: foot, week_19, Out: foot, week_20']
[["Dont'a Hightower"], 'Questionable: knee, week_5, Probable: knee, week_6']
[['Michael Hoomanawanui'], 'Questionable: knee, week_7, Questionable: knee, week_8, Questionable: knee, week_9, Questionable: knee, week_12, Questionable: knee, week_13, Probable: knee, week_14, Questionable: knee, week_15, Questionable: knee, week_16, Probable: knee, week_17']
[['Tommy Kelly'], 'Questionable: knee, week_6, Questionable: knee, week_7, Questionable: knee, week_8, Questionable: knee, week_9, Injured Reserve: knee, week_11, Injured Reserve: knee, week_12, Injured Reserve: knee, week_13, Injured Reserve: knee, week_14, Injured Reserve: knee, week_15, Injured Reserve: knee, week_16, Injured Reserve: knee, week_17, Injured Reserve: knee, week_19, Injured Reserve: knee, week_20']
[['Jerod Mayo'], 'Questionable: ankle, week_4, Questionable: ankle, week_5, Probable: ankle, week_6, Injured Reserve: shoulder, week_7, Injured Reserve: shoulder, week_8, Injured Reserve: shoulder, week_9, Injured Reserve: shoulder, week_11, Injured Reserve: shoulder, week_12, Injured Reserve: shoulder, week_13, Injured Reserve: shoulder, week_14, Injured Reserve: shoulder, week_15, Injured Reserve: shoulder, week_16, Injured Reserve: shoulder, week_17, Injured Reserve: shoulder, week_19, Injured Reserve: shoulder, week_20']
[['Devin McCourty'], 'Questionable: shoulder, week_7, Probable: shoulder, week_8, Questionable: head, week_17']
[['T.J. Moe'], 'Injured Reserve: Achilles, week_1, Injured Reserve: Achilles, week_2, Injured Reserve: Achilles, week_3, Injured Reserve: Achilles, week_4, Injured Reserve: Achilles, week_5, Injured Reserve: Achilles, week_6, Injured Reserve: Achilles, week_7, Injured Reserve: Achilles, week_8, Injured Reserve: Achilles, week_9, Injured Reserve: Achilles, week_11, Injured Reserve: Achilles, week_12, Injured Reserve: Achilles, week_13, Injured Reserve: Achilles, week_14, Injured Reserve: Achilles, week_15, Injured Reserve: Achilles, week_16, Injured Reserve: Achilles, week_17, Injured Reserve: Achilles, week_19, Injured Reserve: Achilles, week_20']
[['Rob Ninkovich'], 'Probable: groin, week_6, Probable: groin, week_7, Probable: groin, week_8, Questionable: foot, week_11, Questionable: ankle, week_17']
[['Stevan Ridley'], 'Probable: shoulder, week_2, Questionable: knee, week_5, Questionable: knee, week_6']
[['Matt Slater'], 'Questionable: knee, week_2, Out: wrist, week_3, Out: wrist, week_4, Out: wrist, week_5, Out: wrist, week_6, Questionable: wrist, week_8, Probable: wrist, week_9, Probable: wrist, week_11, Probable: wrist, week_12, Probable: wrist, week_13, Probable: right shoulder, week_14, Probable: wrist, week_15']
[['Nate Solder'], 'Probable: back, week_7, Questionable: concussion, week_15, Questionable: concussion, week_16, Questionable: concussion, week_17']
[['Brandon Spikes'], 'Questionable: knee, week_12, Probable: knee, week_13, Questionable: knee, week_14, Questionable: knee, week_15, Questionable: knee, week_16, Questionable: knee, week_17']
[['Zach Sudfeld'], 'Questionable: hamstring, week_2, Probable: hamstring, week_3, Probable: hamstring, week_4, Questionable: hamstring, week_5']
[['Will Svitek'], 'Questionable: knee, week_1, Questionable: knee, week_2, Questionable: knee, week_3, Questionable: knee, week_4, Questionable: knee, week_5, Questionable: ankle, week_14, Questionable: ankle, week_15, Questionable: ankle, week_16, Questionable: ankle, week_17']
[['Aqib Talib'], 'Questionable: hip, week_6, Questionable: hip, week_7, Questionable: hip, week_8, Questionable: hip, week_9, Questionable: hip, week_11, Questionable: hip, week_12, Questionable: hip, week_13, Questionable: hip, week_14, Questionable: hip, week_15, Questionable: hip, week_16, Probable: hip, week_17']
[['Kenbrell Thompkins'], 'Questionable: shoulder, week_5, Questionable: hip, week_14, Questionable: hip, week_15, Questionable: hip, week_16, Questionable: hip, week_17']
[['Shane Vereen'], 'Out: wrist, week_2, Injured Reserve: wrist, week_3, Injured Reserve: wrist, week_4, Injured Reserve: wrist, week_5, Injured Reserve: wrist, week_6, Injured Reserve: wrist, week_7, Injured Reserve: wrist, week_8, Injured Reserve: wrist, week_9, Injured Reserve: wrist, week_11, Probable: wrist, week_12, Probable: wrist, week_13, Probable: wrist, week_14, Probable: wrist, week_15, Questionable: groin, week_16, Probable: groin, week_17']
[['Sebastian Vollmer'], 'Questionable: foot, week_4, Questionable: foot, week_5, Injured Reserve: leg, week_9, Injured Reserve: leg, week_11, Injured Reserve: leg, week_12, Injured Reserve: leg, week_13, Injured Reserve: leg, week_14, Injured Reserve: leg, week_15, Injured Reserve: leg, week_16, Injured Reserve: leg, week_17, Injured Reserve: leg, week_19, Injured Reserve: leg, week_20']
[['Leon Washington'], 'Questionable: thigh, week_2, Questionable: thigh, week_3, Questionable: thigh, week_4, Questionable: thigh, week_5, Questionable: ankle, week_6, Questionable: ankle, week_7, Questionable: ankle, week_8, Questionable: ankle, week_9, Questionable: ankle, week_11, Questionable: ankle, week_12']
[['Ryan Wendell'], 'Questionable: concussion, week_6']
[['Chris White'], 'Questionable: back, week_13']
[['Vince Wilfork'], 'Probable: foot, week_4, Out: Achilles, week_5, Injured Reserve: Achilles, week_6, Injured Reserve: Achilles, week_7, Injured Reserve: Achilles, week_8, Injured Reserve: Achilles, week_9, Injured Reserve: Achilles, week_11, Injured Reserve: Achilles, week_12, Injured Reserve: Achilles, week_13, Injured Reserve: Achilles, week_14, Injured Reserve: Achilles, week_15, Injured Reserve: Achilles, week_16, Injured Reserve: Achilles, week_17, Injured Reserve: Achilles, week_19, Injured Reserve: Achilles, week_20']
[['Adrian Wilson'], 'Injured Reserve: hamstring, week_1, Injured Reserve: hamstring, week_2, Injured Reserve: hamstring, week_3, Injured Reserve: hamstring, week_4, Injured Reserve: hamstring, week_5, Injured Reserve: hamstring, week_6, Injured Reserve: hamstring, week_7, Injured Reserve: hamstring, week_8, Injured Reserve: hamstring, week_9, Injured Reserve: hamstring, week_11, Injured Reserve: hamstring, week_12, Injured Reserve: hamstring, week_13, Injured Reserve: hamstring, week_14, Injured Reserve: hamstring, week_15, Injured Reserve: hamstring, week_16, Injured Reserve: hamstring, week_17, Injured Reserve: hamstring, week_19, Injured Reserve: hamstring, week_20']
[['Tavon Wilson'], 'Questionable: hamstring, week_5, Questionable: hamstring, week_6, Questionable: hamstring, week_7, Questionable: hamstring, week_8, Questionable: hamstring, week_9']
[['Markus Zusevics'], 'Injured Reserve: undisclosed, week_1, Injured Reserve: undisclosed, week_2, Injured Reserve: undisclosed, week_3, Injured Reserve: undisclosed, week_4, Injured Reserve: undisclosed, week_5, Injured Reserve: undisclosed, week_6, Injured Reserve: undisclosed, week_7, Injured Reserve: undisclosed, week_8, Injured Reserve: undisclosed, week_9, Injured Reserve: undisclosed, week_11, Injured Reserve: undisclosed, week_12, Injured Reserve: undisclosed, week_13, Injured Reserve: undisclosed, week_14, Injured Reserve: undisclosed, week_15, Injured Reserve: undisclosed, week_16, Injured Reserve: undisclosed, week_17, Injured Reserve: undisclosed, week_19, Injured Reserve: undisclosed, week_20']
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.pro-football-reference.com/teams/nwe/2013_injuries.htm'
# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
# html parsing
page_soup = soup(page_html, "html.parser")
containers = page_soup.find("tbody")
head = page_soup.find("thead")

player = containers.find_all("tr")
weeks = head.find_all('th')
week_list = [i['data-stat'] for i in weeks][1:]
for week in week_list:
    for tr in player:
        th = tr.find_all("th")
        name = [i.text for i in th]
        td = tr.find('td', {'data-stat':week})
        week = td["data-stat"]
        try:
            injury = td["data-tip"]
            print([name, injury, week])
        except KeyError:
            injury = "NA"
            print([name, injury, week])

最新更新