提取带有特殊字符的正则表达式



我想在pandas数据框df中基于另一个列ID创建一个列。对于包含字符串SATID,我想提取由特殊字符"-"连接的浮点数。并将提取的结果放在名为new_col的新列中。如果ID不包含SAT字符串,则保留NaN

df如下:

Date        ID                   Time
0   2007-01-10  SAT 1 HHSP           900
1   2007-01-10  DOUBLE 7 HHSP        900
2   2007-01-10  SAT GF-06-5CSBG.431  1000
3   2007-01-10  MA HYDRO HHSP        900
4   2007-01-10  2.233 HHSP           900
5   2007-01-10  SAT L2-15-3CSB1.252  1000
6   2007-01-10  SECTION 6 HHSP       900

预期输出:

Date        ID                   Time     new_col
0   2007-01-10  SAT 1 HHSP           900      NaN
1   2007-01-10  DOUBLE 7 HHSP        900      NaN
2   2007-01-10  SAT GF-06-5CSBG.431  1000     06-5
3   2007-01-10  MA HYDRO HHSP        900      NaN
4   2007-01-10  2.233 HHSP           900      NaN
5   2007-01-10  SAT L2-15-3 CSB1.252  1000    15-3  * In this case 15-3 instead of 2-15 is extracted because L2 is not completely floats.
6   2007-01-10  SECTION 6 HHSP       900      NaN

使用Series.str.extract-连接的数字,-在它之前,只有SATSeries.str.contains过滤的值:

m = df['ID'].str.contains('SAT')
df['new_col'] = df.loc[m, 'ID'].str.extract('[-s+](d+-d+)')
print (df)
Date                   ID  Time new_col
0  2007-01-10           SAT 1 HHSP   900     NaN
1  2007-01-10        DOUBLE 7 HHSP   900     NaN
2  2007-01-10  SAT GF-06-5CSBG.431  1000    06-5
3  2007-01-10        MA HYDRO HHSP   900     NaN
4  2007-01-10           2.233 HHSP   900     NaN
5  2007-01-10  SAT L2-15-3CSB1.252  1000    15-3
6  2007-01-10       SECTION 6 HHSP   900     NaN

如果valueSATstart in column是可能的,使用:

df['new_col'] = df['ID'].str.extract('^SAT.*[-s+](d+-d+)', expand=False)

相关内容

  • 没有找到相关文章

最新更新