使用 python 正则表达式提取项目编号



大家好。我想从project_name列中提取斜杠符号后的最后一位数字。目前,我正在研究它,但有一些问题如下:

  1. 如何在不得到带有方括号的结果的情况下提取斜杠符号后的最后一位数字?因为现在我有几乎有效的代码,但结果总是有一个方括号

我的代码:

def project_name(name):
return re.findall(r'd{3}$',name)
data['project_name'] = data['project_name'].apply(project_name)

数据:

project_name    
----------
ASAHI,PT-PRO/PTN/06-2012/192          
CIMB NIAGA-PRO/PTN/06-2012/174        
FRAMAS INDONESIA-PRO/PTN/06-2012/210    
DM STOCK 2015   
PERBAIKAN OH TM 366 PLANT DAWUAN 
Ruko-PRO/PTN/03-2012/47

我的输出:

(Expected)project_name   
----------     
192            
174            
210            
NaN
NaN            
NaN            
47            

感谢所有建议和意见。谢谢大家

使用Series.str.extract并将/添加到正则表达式中:

data['project_name'] = data['project_name'].str.extract(r'/(d{3}$)')
print (data)
project_name
0          192
1          174
2          210
3          NaN
4          NaN
5          NaN
6          NaN

findall的解决方案:

data['project_name'] = data['project_name'].str.findall(r'/(d{3}$)').str[0]

并且您的解决方案应更改为next,如果没有匹配,则返回默认值np.naniter

def project_name(name):
return next(iter(re.findall(r'/(d{3})$',name)), np.nan)
data['project_name'] = data['project_name'].apply(project_name)
print (data)
project_name
0          192
1          174
2          210
3          NaN
4          NaN
5          NaN
6          NaN

而不是

def project_name(name):
return re.findall(r'd{3}$',name)

def project_name(name):
return re.findall(r'd{3}$',name)[0]

由于列表中的值只有一个,我们可以返回0th索引的值

def project_name(name):
return re.findall(r'd{3}$',name)[0]
data['project_name'] = data['project_name'].apply(project_name)

最新更新