尝试更正此正则表达式Lung[ ]+Rads[ ]+Catagory[ ]+(\d+[A-Z]*)
从非结构化放射学报告中的Lung Rads分类中提取1、1S、2、2S、3、3S、4A、4AS、4B、4BS、4X、4XS以及C对这些倍数的潜在附加值,其中的语言可能看起来像:
LUNG RADS CATEGORY 2
Lung RADS category 1,
Lung RADS category 2.
LUNG RADS CATEGORY 1
Lung RADS 3S.
Lung RADS Category 1:
Lung RADS Category 1 (S):
LUNG RADS CATEGORY 1 S
Lung RADS category: 2S.
Lung RADS: 2C
Lung RADS category 4B,
Lung RADS category 1S.
Lung RADS: 3.
Lung RADS category I
Lung RADS 2
LUNG RADS CATEGORY:I
LUNGRADS 2
LUNGRAD 2
LUNG-RAD 3
您可以使用以下正则表达式来匹配感兴趣的字符串部分。
(?<=^(?:(?:Lung|LUNG) RADS:?(?: CATEGORY:?| [cC]ategory:?)? ?|LUNGRADS? |LUNG-RAD ))(?:d[A-Z]?|[A-Z])(?![A-Z])
Javascript演示
Javascript的正则表达式引擎执行以下操作。
(?<= begin positive lookbehind
^ match beginning of line
(?: begin a non-cap grp
(?:Lung|LUNG) RADS:? match 'Lung' or 'LUNG' followed by a a space,
then 'RADS', opt followed by ':
(?: begin non-cap grp
[ ]CATEGORY:? match space, then 'CATEGORY', then opt ':"
| or
[ ][cC]ategory:? match space then 'category' or 'Category'
then opt ':'
) end non-cap grp
? opt match non-cap grp
[ ]? opt match a space
| or
LUNGRADS?[ ] match 'LUNGRAD ' or 'LUNGRADS '
| or
LUNG-RAD[ ] match 'LUNG-RAD '
) end non-cap grp
) end positive lookbehind
(?: begin non-cap grp
d[A-Z]? match digit then opt cap letter
| or
[A-Z] match one cap letter
) end non-cap grp
(?![A-Z]) match a cap letter in a negative lookahead