我正在寻找一种使用正则表达式和 python 从文本中提取作业编号的方法

如果文本是"作业45，作业32，然后是作业15"，我想得到['job 45'，'job 32'，'job 15']或['45'，'32'，'15']

我尝试了r'[job]\d+'，它返回一个空列表。

re.findall(r'[job]d+', 'Job 45, job 32 and then job 15'.lower())
[]

我在工作中尝试过分手。

re.split(r'job','Job 45, job 32 and then job 15'.lower())
['', ' 45, ', ' 32 and then ', ' 15']

我试着拼字。

re.findall(r'w+','Job 45, job 32 and then job 15'.lower())
['job', '45', 'job', '32', 'and', 'then', 'job', '15']

这是可行的。。我可以检查一个元素是否为"job"，以及以下元素是否可以转换为数字。

从"job 45，job 32，然后是job 15"中得到['job 45'，'job 32'，'job 15']或['45'，'32'，'15']的正则表达式是什么？

您的正则表达式[job]d+有几个问题，

[job]是一个字符集，这意味着它将只匹配一个字符j或o或b

第二个问题是，正则表达式中的job和number之间没有空格。

第三个问题，因为您的输入文本包含Job和Job，所以要进行不区分大小写的匹配，您需要(？i(标志。

所以你的正则表达式的正确形式变成了这个，

(?i)jobs+d+

演示

示例python代码

import re
s = 'Job 45, job 32 and then job 15';
str = re.findall('(?i)jobs+d+', s)
print(str)

这会产生以下输出，

['Job 45', 'job 32', 'job 15']

或者更容易使用'job (d+)'表达式：

>>> re.findall('job (d+)',s.lower())
['45', '32', '15']
>>>

一种方法是使用以下模式，该模式使用积极的后备：

(?<=bjob )d+

这将捕获任何一组数字，这些数字的前面紧跟着文本job(不区分大小写(，后面跟着一个空格。

text = "Job 45, job 32 and then job 15"
res = re.findall(r'(?<=bjob )d+', text, re.I)
print(res)
['45', '32', '15']

相关内容