什么是用于删除列中所有文本的python正则表达式

我正在尝试清理列：

df:
+-----+------------------+--------------------+--------------------+--------------+--------------+
|     | league           | home_team          | away_team          | home_score   | away_score   |
+=====+==================+====================+====================+==============+==============+
|   0 | Champions League | APOEL              | Qarabag            | 1            | 2            |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   1 | Champions League | FC Copenhagen      | TNS                | 1            | 0            |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   2 | Champions League | AIK                | Maribor            | 3            | 2 ET         |
+-----+------------------+--------------------+--------------------+--------------+--------------+

预期

df:
+-----+------------------+--------------------+--------------------+--------------+--------------+
|     | league           | home_team          | away_team          | home_score   | away_score   |
+=====+==================+====================+====================+==============+==============+
|   0 | Champions League | APOEL              | Qarabag            | 1            | 2            |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   1 | Champions League | FC Copenhagen      | TNS                | 1            | 0            |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   2 | Champions League | AIK                | Maribor            | 3            | 2            |
+-----+------------------+--------------------+--------------------+--------------+--------------+

我正在尝试

df['away_score'] = df['away_score'].astype(str).str.replace('(s?w+)$', '', regex=True)

(适用于regex101，但不适用于panda(

但列中的所有数据都被替换了。

+-----+------------------+--------------------+--------------------+--------------+--------------+
|     | league           | home_team          | away_team          | home_score   | away_score   |
+=====+==================+====================+====================+==============+==============+
|   0 | Champions League | APOEL              | Qarabag            | 1            |              |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   1 | Champions League | FC Copenhagen      | TNS                | 1            |              |
+-----+------------------+--------------------+--------------------+--------------+--------------+
|   2 | Champions League | AIK                | Maribor            | 3            | 2            |
+-----+------------------+--------------------+--------------------+--------------+--------------+

什么应该是正确的正则表达式？

我尝试了这个正则表达式，它成功了。

df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Z]', '', regex=True)

要完全清理文本(包括空格(，您应该使用：

df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Zs]', '', regex=True)

这样，您还可以清理字母表之前的空格，例如ET中ET之前的空格。

如果你不仅想清理文本，还想清理一些非数字的符号(只留下数字(，你可以使用：

df['away_score'] = df['away_score'].astype(str).str.replace('D', '', regex=True)

相关内容

最新更新

热门标签：