我需要合并这两个数据帧:
df_melt
:
dtype: object MatchID GameWeek Date Team Home AgainstTeam
0 46605 1 2019-08-09 Liverpool Home Norwich City
1 46605 1 2019-08-09 Norwich City Away Liverpool
2 46606 1 2019-08-10 AFC Bournemouth Home Sheffield United
3 46606 1 2019-08-10 Sheffield United Away AFC Bournemouth
4 46607 1 2019-08-10 Burnley Home Southampton
.. ... ... ... ... ... ...
533 46871 27 2020-02-23 Watford Away Manchester United
534 46872 27 2020-02-22 Sheffield United Home Brighton and Hove Albion
535 46872 27 2020-02-22 Brighton and Hove Albion Away Sheffield United
536 46873 27 2020-02-22 Southampton Home Aston Villa
537 46873 27 2020-02-22 Aston Villa Away Southampton
df_pm
:
dtype: object Player GameWeek Minutes ... CloseShotCreated TotalShotCreated HeadersCreated
PlayerMatchesDetailID ...
1 Alisson 1 90 ... 0 0 0
2 Virgil van Dijk 1 90 ... 0 0 0
3 Joseph Gomez 1 90 ... 0 1 0
4 Andrew Robertson 1 90 ... 0 1 0
5 Trent Alexander-Arnold 1 90 ... 3 3 1
... ... ... ... ... ... ... ...
15053 Matty James 22 0 ... 0 0 0
15054 Matty James 23 0 ... 0 0 0
15055 Matty James 24 0 ... 0 0 0
15056 Matty James 25 0 ... 0 0 0
15057 Matty James 26 0 ... 0 0 0
这就是我尝试执行合并的方式:
#Instantiate an empty list
match_ids = []
home_away = []
dates = []
#For each row in the player matches dataframe...
for row in df_pm.itertuples():
#Look up the match id from the team matches dataframe
team = row.ForTeam
againstteam = row.AgainstTeam
gameweek = row.GameWeek
match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'MatchID'].item()
print ('MATCH',match_id)
date = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'Date'].item()
home = df_melt.loc[(df_melt['GameWeek']==gameweek)
&(df_melt['Team']==team)
&(df_melt['AgainstTeam']==againstteam),
'Home'].item()
#Add it to the list
match_ids.append(match_id)
home_away.append(home)
dates.append(date)
但我得到了:
Traceback (most recent call last):
File "tableau_data_generation.py", line 161, in <module>
'MatchID'].item()
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar
暗示也许没有一排。但是打印整个数据帧后,我发现没有有缺陷的数据。
但是当我检查类型时,我看到:
df_melt
:
MatchID object
GameWeek object
Date object
Team object
Home object
AgainstTeam object
df_pm
:
Player object
GameWeek int64
Minutes int64
ForTeam object
AgainstTeam object
Goals int64
ShotsOnTarget int64
ShotsInBox int64
CloseShots int64
TotalShots int64
Headers int64
GoalAssists int64
ShotOnTargetCreated int64
ShotInBoxCreated int64
CloseShotCreated int64
TotalShotCreated int64
HeadersCreated int64
我想这种不匹配一定是罪魁祸首...
解决此问题并转换不匹配类型的最佳方法是什么?
提供的解决方案注意事项:
稍后,我需要执行以下作业:
def pos_lookup(x):
return df_player_basic.loc[df_player_basic['CommentName']==x,
'Position'].item()
#Declare the list as a column in the player matches df
df_pm['MatchID']=match_ids
df_pm['Date']=pd.to_datetime(dates)
df_pm['Home']=home_away
df_pm['Position']=df_pm['Player'].map(pos_lookup)
转换一列数据帧:
Df[column_name]=Df[column_name].astype(datatype)
即。"数据类型"是int
、str
、float
等