大家好。我想从project_name
列中提取斜杠符号后的最后一位数字。目前,我正在研究它,但有一些问题如下:
- 如何在不得到带有方括号的结果的情况下提取斜杠符号后的最后一位数字?因为现在我有几乎有效的代码,但结果总是有一个方括号
我的代码:
def project_name(name):
return re.findall(r'd{3}$',name)
data['project_name'] = data['project_name'].apply(project_name)
数据:
project_name
----------
ASAHI,PT-PRO/PTN/06-2012/192
CIMB NIAGA-PRO/PTN/06-2012/174
FRAMAS INDONESIA-PRO/PTN/06-2012/210
DM STOCK 2015
PERBAIKAN OH TM 366 PLANT DAWUAN
Ruko-PRO/PTN/03-2012/47
我的输出:
(Expected)project_name
----------
192
174
210
NaN
NaN
NaN
47
感谢所有建议和意见。谢谢大家
使用Series.str.extract
并将/
添加到正则表达式中:
data['project_name'] = data['project_name'].str.extract(r'/(d{3}$)')
print (data)
project_name
0 192
1 174
2 210
3 NaN
4 NaN
5 NaN
6 NaN
带findall
的解决方案:
data['project_name'] = data['project_name'].str.findall(r'/(d{3}$)').str[0]
并且您的解决方案应更改为next
,如果没有匹配,则返回默认值np.nan
iter
:
def project_name(name):
return next(iter(re.findall(r'/(d{3})$',name)), np.nan)
data['project_name'] = data['project_name'].apply(project_name)
print (data)
project_name
0 192
1 174
2 210
3 NaN
4 NaN
5 NaN
6 NaN
而不是
def project_name(name):
return re.findall(r'd{3}$',name)
用
def project_name(name):
return re.findall(r'd{3}$',name)[0]
由于列表中的值只有一个,我们可以返回0th
索引的值
def project_name(name):
return re.findall(r'd{3}$',name)[0]
data['project_name'] = data['project_name'].apply(project_name)