我试图从number
旁边的值下面的文本和中间的文本中提取。
:
The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high.
我想从这个文本中提取以下值:
1, the patient is allergic to dust,
next, the patient has bronchitis,
4, The patient heart rate is high
我有一个模式,允许我获得number
旁边的值和句子的第一个单词:
(numbers? (d+|next)[,.]?s?(w+))
这是使用re.findall
[('number 1, the', '1', 'the'),
('number next, the', 'next', 'the'),
('number 4, The', '4', 'The')]
如您所见,使用分组可以从文本中提取数字或next
值。但是我还没能把整个句子抽出来。
由于.
和,
以及空白字符在数字或next
之后是可选的,因此您可以在字符串的右侧或末尾使用非贪婪点再次断言数字。
bnumbers? (d+|next)[,.]?s?(w.*?)(?= numbers?b|.?$)
Regex演示
import re
pattern = r"bnumbers? (d+|next)[,.]?s?(w.*?)(?= numbers?b|.?$)"
s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."
print(re.findall(pattern, s))
输出[
('1', 'the patient is allergic to dust,'),
('next', 'the patient has bronchitis,'),
('4', 'The patient heart rate is high')
]
Try (regex101):
import re
s = "The conditions are: number 1, the patient is allergic to dust, number next, the patient has bronchitis, number 4, The patient heart rate is high."
pat = re.compile(r"numbers? (d+|next)[,.]?s?([^[,.]+)")
print(pat.findall(s))
打印:
[
("1", "the patient is allergic to dust"),
("next", "the patient has bronchitis"),
("4", "The patient heart rate is high"),
]