在sportsreference刮擦中添加一个列来指定列观察值



[Marko prcaic]提供了以下答案(Sportsipy API请求)

from sportsreference.nba.schedule import Schedule
# MIL removed from league list as it is used to initiate league_schedule
league = ['CHO','LAL','LAC','SAC','ATL','MIA','DAL','POR',
'HOU','NOP','PHO','WAS','MEM','BOS','DEN','TOR','SAS',
'PHI','BRK','UTA','IND','OKC','ORL','MIN','DET',
'NYK','CLE','CHI','GSW']
league_schedule = Schedule('MIL', year="2020").dataframe
for team in league:
league_schedule = league_schedule.append(Schedule(team , year="2020").dataframe)

这完美地满足了我的需求——返回的数据框架产生了除了联赛球队本身之外的所有相关信息。我们可以获得game_time等内容,但却无法获得计划所针对的团队。结果是:

boxscore_index               date   datetime  ...               streak   time  wins
201910240HOU   201910240HOU  Thu, Oct 24, 2019 2019-10-24  ...     W 1  8:00p     1
201910260MIL   201910260MIL  Sat, Oct 26, 2019 2019-10-26  ...     L 1  5:00p     1
201910280MIL   201910280MIL  Mon, Oct 28, 2019 2019-10-28  ...     W 1  8:00p     2
201910300BOS   201910300BOS  Wed, Oct 30, 2019 2019-10-30  ...     L 1  7:30p     2
201911010ORL   201911010ORL  Fri, Nov 1, 2019  2019-11-01  ...     W 1  7:00p     3

但是我想要的是一个额外的列,声明与MIL相关的所有列观测值的MIL,与CHO相关的所有列观测值的CHO,等等。生成的DataFrame如下所示

boxscore_index               team      date      datetime  ...        streak   time  wins
201910240HOU   201910240HOU  MIL  Thu, Oct 24, 2019 2019-10-24  ...     W 1  8:00p     1
201910260MIL   201910260MIL  MIL  Sat, Oct 26, 2019 2019-10-26  ...     L 1  5:00p     1
201910280MIL   201910280MIL  MIL  Mon, Oct 28, 2019 2019-10-28  ...     W 1  8:00p     2
201910300BOS   201910300BOS  MIL  Wed, Oct 30, 2019 2019-10-30  ...     L 1  7:30p     2
201911010ORL   201911010ORL  MIL  Fri, Nov 1, 2019  2019-11-01  ...     W 1  7:00p     3

再多做一点工作和一点帮助就可以了:

from sportsreference.nba.schedule import Schedule
import pandas as pd
league = ['MIL','CHO','LAL','LAC','SAC','ATL','MIA','DAL',
'POR','HOU','NOP','PHO','WAS','MEM','BOS','DEN',
'TOR','SAS','PHI','BRK','UTA','IND','OKC','ORL',
'MIN','DET','NYK','CLE','CHI','GSW']
df = pd.DataFrame([])
for team in league:
df_ = Schedule(team, year='2020').dataframe
df_['team'] = team
df = df.append(df_)

这将产生:

boxscore_index      date           datetime  ...    time  wins team
201910240HOU   201910240HOU  Thu, Oct 24, 2019 2019-10-24  ...  8:00p    1   MIL
201910260MIL   201910260MIL  Sat, Oct 26, 2019 2019-10-26  ...  5:00p    1   MIL
201910280MIL   201910280MIL  Mon, Oct 28, 2019 2019-10-28  ...  8:00p    2   MIL
201910300BOS   201910300BOS  Wed, Oct 30, 2019 2019-10-30  ...  7:30p    2   MIL
201911010ORL   201911010ORL   Fri, Nov 1, 2019 2019-11-01  ...  7:00p    3   MIL

相关内容

最新更新