我主要是自己编程,所以没有人检查我的代码。我觉得我养成了一堆坏习惯。
我在这里粘贴的代码是有效的,但我想听听其他一些解决方案。
我创建了一个名为teams_shots
的词典。我遍历pandas数据帧,它在一行中有客场和主队的名称。我想跟踪数据框中出现的每支球队的投篮情况。这就是为什么我检查home_team_name
或away_team_name
在字典中是否没有条目,如果是,我会创建一个条目。
for index,match in df.iterrows():
if match['home_team_name'] not in teams_shots:
#we have to setup an entry in the dictionary
teams_shots[match['home_team_name']]=[]
teams_shots[match['home_team_name']].append(match['home_team_shots'])
home_shots_avg.append(None)
else:
home_shots_avg.append(np.mean(teams_shots[match['home_team_name']]))
teams_shots[match['home_team_name']].append(match['home_team_shots'])
if match['away_team_name'] not in teams_shots:
teams_shots[match['away_team_name']]=[]
teams_shots[match['away_team_name']].append(match['away_team_shots'])
away_shots_avg.append(None)
else:
away_shots_avg.append(np.mean(teams_shots[match['away_team_name']]))
teams_shots[match['away_team_name']].append(match['away_team_shots'])
正如您所看到的,几乎相同的代码被写了两次,这不是良好编程的标志。我曾想过在if语句中使用or
运算符,但可能已经创建了一个条目,我会截断它。
在这种情况下,我认为额外的for
循环应该起作用:
for index,match in df.iterrows():
for name, shots in {'home_team_name':'home_team_shots',
'away_team_name':'away_team_shots'}:
if match[name] not in teams_shots:
#we have to setup an entry in the dictionary
teams_shots[name]=[]
teams_shots[name].append(match[shots])
home_shots_avg.append(None)
else:
home_shots_avg.append(np.mean(teams_shots[name]))
但可能有一种方法可以用矢量化的方式来处理这个问题。
我会使用get
作为快速查找。它不抛出KeyErrors
,默认的None
充当真实中的False
for index, match in df.iterrows():
home, away, home_shots, away_shots = match['home_team_name'],
match['away_team_name'],
match['home_team_shots'],
match['away_team_shots']
if not teams_shots.get(home):
# No need to separately allocate the array
teams_shots[home] = [home]
home_shots_avg.append(None)
else:
home_shots_avg.append(np.mean(teams_shots[home_shots]))
if not teams_shots.get(away):
teams_shots[away] = [away]
away_shots_avg.append(None)
else:
away_shots_avg.append(np.mean(teams_shots[away_shots]))