使用正则表达式提取字符串之间的文本



我试图从number旁边的值下面的文本和中间的文本中提取。

:
The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high.

我想从这个文本中提取以下值:

  • 1, the patient is allergic to dust,
  • next, the patient has bronchitis,
  • 4, The patient heart rate is high

我有一个模式,允许我获得number旁边的值和句子的第一个单词:

(numbers? (d+|next)[,.]?s?(w+))

这是使用re.findall

的结果
[('number 1, the', '1', 'the'),
('number next, the', 'next', 'the'),
('number 4, The', '4', 'The')]

如您所见,使用分组可以从文本中提取数字或next值。但是我还没能把整个句子抽出来。

由于.,以及空白字符在数字或next之后是可选的,因此您可以在字符串的右侧或末尾使用非贪婪点再次断言数字。

bnumbers? (d+|next)[,.]?s?(w.*?)(?= numbers?b|.?$)

Regex演示

import re

pattern = r"bnumbers? (d+|next)[,.]?s?(w.*?)(?= numbers?b|.?$)"

s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."

print(re.findall(pattern, s))

输出
[
('1', 'the patient is allergic to dust,'),
('next', 'the patient has bronchitis,'),
('4', 'The patient heart rate is high')
]

Try (regex101):

import re
s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."
pat = re.compile(r"numbers? (d+|next)[,.]?s?([^[,.]+)")
print(pat.findall(s))

打印:

[
("1", "the patient is allergic to dust"),
("next", "the patient has bronchitis"),
("4", "The patient heart rate is high"),
]