是否有一种方法来过滤或从Beautifulsoup删除数据?



我正在尝试创建一个包含游戏及其时间表信息的网站。最初,我成功地将所有相关数据导入到我的程序中;然而,一旦游戏开始,这种情况就发生了变化。该网站从其显示中删除了"时间"列,这导致导入到我的程序中的列数量不均匀-比以前少一个,因为没有"时间"列了!这造成了问题,因为现在当我试图从收集的信息中构造一个数据框时,由于每行中的条目数量不等,它将无法正常工作。我想只导入那些尚未播放的。

import requests
from bs4 import BeautifulSoup
link = "https://www.espn.com/nfl/schedule/_/week/1/year/2022/seasontype/3"
page = requests.get(link)
soup = BeautifulSoup(page.content,"html.parser")
nfl_resp = soup.find_all('div',class_='ResponsiveTable')
visit = i.find_all(class_="events__col Table__TD")
nfl_list = []
nfl_time_list = []
nfl_location_list = []
visit_list = []
`for i in nfl_resp:`
location = i.find_all(class_='location__col Table__TD')
for team in location:
nfl_location_list.append(team.text)
#I get all the correct stadiums 
for i in nfl_resp:
time = i.find_all(class_='date__col Table__TD')
for hour in time:
nfl_time_list.append(hour.text)
#I get all the correct times
for i in nfl_resp:
location = i.find_all(class_='location__col Table__TD')
for team in location:
nfl_location_list.append(team.text)
#I get all dates correctly
for team in visit:
visit_list.append(team.text)
#Here's the problem, I get all the games regardless if they started or not.
#It only works if the games are yet to start, I need to run it when the games are running or over too.

您可以使用以下示例来解析来自ESPN网站的各种信息:

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://www.espn.com/nfl/schedule/_/week/1/year/2022/seasontype/3"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_data = []
for row in soup.select(".Table__TR:has(.AnchorLink)"):
data = [t.text for t in row.select(".AnchorLink:not(:has(img))")]
networks = [
n["alt"] if n.name == "img" else n.text
for n in row.select(".network-container img, .network-container .network-name")
]
date = row.find_previous(class_="Table__Title").text.strip()
all_data.append([*data, networks, date])
df = pd.DataFrame(
all_data,
columns=["Team 1", "Team 2", "Time", "Tickets", "Stadium", "Networks", "Date"],
)
print(df)

打印:

Team 1         Team 2     Time                 Tickets                             Stadium            Networks                        Date
0      Seattle  San Francisco  4:30 PM  Tickets as low as $138     Levi's Stadium, Santa Clara, CA               [FOX]  Saturday, January 14, 2023
1  Los Angeles   Jacksonville  8:15 PM  Tickets as low as $138   TIAA Bank Field, Jacksonville, FL               [NBC]  Saturday, January 14, 2023
2        Miami        Buffalo  1:00 PM  Tickets as low as $114  Highmark Stadium, Orchard Park, NY               [CBS]    Sunday, January 15, 2023
3     New York      Minnesota  4:30 PM  Tickets as low as $116  U.S. Bank Stadium, Minneapolis, MN               [FOX]    Sunday, January 15, 2023
4    Baltimore     Cincinnati  8:15 PM  Tickets as low as $171      Paycor Stadium, Cincinnati, OH               [NBC]    Sunday, January 15, 2023
5       Dallas      Tampa Bay  8:15 PM  Tickets as low as $163    Raymond James Stadium, Tampa, FL  [ESPN, ABC, ESPN+]    Monday, January 16, 2023

最新更新