我正在尝试清理列:
df:
+-----+------------------+--------------------+--------------------+--------------+--------------+
| | league | home_team | away_team | home_score | away_score |
+=====+==================+====================+====================+==============+==============+
| 0 | Champions League | APOEL | Qarabag | 1 | 2 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 1 | Champions League | FC Copenhagen | TNS | 1 | 0 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 2 | Champions League | AIK | Maribor | 3 | 2 ET |
+-----+------------------+--------------------+--------------------+--------------+--------------+
预期
df:
+-----+------------------+--------------------+--------------------+--------------+--------------+
| | league | home_team | away_team | home_score | away_score |
+=====+==================+====================+====================+==============+==============+
| 0 | Champions League | APOEL | Qarabag | 1 | 2 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 1 | Champions League | FC Copenhagen | TNS | 1 | 0 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 2 | Champions League | AIK | Maribor | 3 | 2 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
我正在尝试
df['away_score'] = df['away_score'].astype(str).str.replace('(s?w+)$', '', regex=True)
(适用于regex101,但不适用于panda(
但列中的所有数据都被替换了。
+-----+------------------+--------------------+--------------------+--------------+--------------+
| | league | home_team | away_team | home_score | away_score |
+=====+==================+====================+====================+==============+==============+
| 0 | Champions League | APOEL | Qarabag | 1 | |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 1 | Champions League | FC Copenhagen | TNS | 1 | |
+-----+------------------+--------------------+--------------------+--------------+--------------+
| 2 | Champions League | AIK | Maribor | 3 | 2 |
+-----+------------------+--------------------+--------------------+--------------+--------------+
什么应该是正确的正则表达式?
我尝试了这个正则表达式,它成功了。
df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Z]', '', regex=True)
要完全清理文本(包括空格(,您应该使用:
df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Zs]', '', regex=True)
这样,您还可以清理字母表之前的空格,例如ET
中ET
之前的空格。
如果你不仅想清理文本,还想清理一些非数字的符号(只留下数字(,你可以使用:
df['away_score'] = df['away_score'].astype(str).str.replace('D', '', regex=True)