我有一个数据帧
Unnamed: 0 game score home_odds draw_odds away_odds country league datetime
0 0 Sport Recife - Imperatriz 2:2 1.36 4.31 7.66 Brazil Copa do Nordeste 2020 2020-02-07 00:00:00
1 1 ABC - America RN 2:1 2.62 3.30 2.48 Brazil Copa do Nordeste 2020 2020-02-02 22:00:00
2 2 Frei Paulistano - Nautico 0:2 5.19 3.58 1.62 Brazil Copa do Nordeste 2020 2020-02-02 00:00:00
3 3 Botafogo PB - Confianca 1:1 2.06 3.16 3.5 Brazil Copa do Nordeste 2020 2020-02-02 22:00:00
4 4 Fortaleza - Ceara 1:1 2.19 2.98 3.38 Brazil Copa do Nordeste 2020 2020-02-02 22:00:00
我正在执行以下功能
df['game'] = df['game'].astype(str).str.replace('((w+))', '', regex=True)
df['league'] = df['league'].astype(str).str.replace('(sd+Sd+)$', '', regex=True)
df['game'] = df['game'].astype(str).str.replace('(sd+Sd+)$', '', regex=True)
df[['home_team', 'away_team']] = df['game'].str.split(' - ', expand=True, n=1)
df[['home_score', 'away_score']] = df['score'].str.split(':', expand=True)
df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-ZsD]', '', regex=True)
df['home_score'] = df['home_score'].astype(str).str.replace('[a-zA-ZsD]', '', regex=True)
df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]
m = df[['home_odds', 'draw_odds', 'away_odds']].astype(str).agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
df = df[~m]
df = df[~n]
df = df[~o]
df = df[df.home_score != '']
df = df[df.away_score != '']
df = df.dropna()
然而,当我这样做时,我会收到警告:
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df = df[~n]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df = df[~o]
如何解决此问题?
Lorem ipsum dolor坐amet,consectetur adipiscing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua。乌特尼姆和最小的威尼斯人,诺斯特鲁德·埃克劳姆科的工作人员,尼西和阿利奎普的前任。
我认为您可以尝试更改:
n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
至:
n = df['home_score'].str.count('-').ne(0)
o = df['away_score'].str.count('-').ne(0)
它应该是一样的:
n = ~df['home_score'].str.contains('-')
o = ~df['away_score'].str.contains('-')
也应该改变:
df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]
至:
df = df[~df.home_score.isin([".",".."]) |
df[['home_odds','draw_odds','away_odds']].ne("-").any(axis=1)]