Python 正则表达式多次替换两个字符串之间的文本,同时保留字符串



我在下面链接了一个文档。我想替换多次出现的某些字符串之间的所有文本,但也要保持分隔符字符串不变。

例如,在本文档中:

KINERET
Products Affected

PA Criteria
Criteria Details
•  Kineret
Indications
All FDA-approved Indications, Some Medically accepted Indications.
Off-Label Uses
JIA
Exclusion
Criteria
Required
Medical
Information
Age Restrictions  N/A
HYPERSENSITIVITY TO PROTEINS DERIVED FROM E.COLI
DIAGNOSIS OF CHRONIC INFANTILE NEUROLOGICAL,
CUTANEOUS AND ARTICULAR SYNDROME, RHEUMATOID
ARTHRITIS OR JIA
Prescriber
Restrictions
RHEUMATOLOGIST, DERMATOLOGIST,NEUROLOGIST OR
PEDIATRICIAN
Coverage
Duration
4 MO INITIAL, 1 YEAR ON REAPPROVAL BASED ON RESPONSE
TO TX
Other Criteria
RA CRITERIA. DOC OF INTOLERANCE OR FAILURE TO
RESPOND TO A 2MO TRIAL OF A DMARD THERAPY, SUCH AS
METHOTREXATE, ARAVA(LEFLUNOMIDE), PLAQUENIL
(HYDROXYCHLOROQUINE), OR SULFASALAZINE AND TRIAL
AND FAILURE WITH HUMIRA AND ENBREL. JIA CRITERA:
INADEQUATE RESP, INTOLERANCE, OR CONTRAINDICATION
TO CORTICOSTEROIDS AND TRIAL AND FAILURE WITH
HUMIRA.
Formulary ID 20387 Version 16

87




•  Korlym
KORLYM
Products Affected

Off-Label Uses
N/A
Exclusion
Criteria
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
TYPE 2 DIABETES MELLITUS UNRELATED TO ENDOGENOUS
CUSHINGS, PREGNANCY, USE OF SIVASTATIN OR
LOVASTATIN AND CYP3A SUBSTRATES W NARROW
THERAPEUTIC RANGE, CONCURRENT LONGTERM
CORTICOSTEROID USE, WOMEN W HX OF UNEXPLAINED
VAGINAL BLEEDING, WOMEN W ENDOMETRIAL
HYPERPLASIA W ATYPIA OR ENDOMETRIAL CARCINOMA
Required
Medical
Information
COVERED FOR INDICATION OF CONTROLLING
HYPERGLYCEMIA SECONDARY TO HYPERCORTISOLISM IN
ADULT PATIENTS WITH ENDOGENOUS CUSHINGS SYNDROME
WHO HAVE TYPE 2 DIABETES MELLITUS OR GLUCOSE
INTOLERANCE AND HAVE FAILED SURGERY OR ARE NOT
CANDIDATES FOR SURGERY.
Age Restrictions  N/A
N/A
Prescriber
Restrictions
Coverage
Duration
Other Criteria
N/A
6MO AT A TIME
Formulary ID 20387 Version 16

88




•  Krystexxa
KRYSTEXXA
Products Affected

Off-Label Uses
N/A
Exclusion
Criteria
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
ANAPHYLAXIS AND INFUSIONS REACTIONS (BOXED
WARNING), CONTRAINDICATED IN PT  W/G6PD DEFICIENCY
DUE TO RISK OF HEMOLYSIS AND
METHEMOGLOBINEMIA,GOUT FLARES DURING INITIATION OF
TX
Required
Medical
Information
DOCUMENTATION OF CHRONIC GOUT IN ADULT PATIENTS
REFRACTORY TO CONVENTIONAL THERAPY AND 3MO TRIAL
OF XO INHIBITOR (ALLOPURINOL ,ULORIC).
Age Restrictions  N/A
N/A
Prescriber
Restrictions
Coverage
Duration
Other Criteria
N/A
6MO AT A TIME
Formulary ID 20387 Version 16

89




KUVAN
Products Affected

•  Kuvan
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
Age Restrictions  N/A
N/A
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Other Criteria
2MO AT A TIME INITIAL 3 MO THEREAFTER
PRIOR AUTHORIZATION IS TO MONITOR IF PATIENT IS A
RESPONDER OR NONRESPONDER AFTER THERAPY HAS BEEN
INITIATED FOR  2MONTHS. IF PHENYLALANINE LEVELS HAVE
DECREASED AFTER THE 2 MONTHS, THEN AUTHORIZATION
WILL CONTINUE.
Formulary ID 20387 Version 16

90




KYNAMRO
Products Affected

•  Kynamro
PA Criteria
Criteria Details
Indications
All Medically-accepted Indications.
Off-Label Uses
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
HEPATIC IMPAIRMENT, MOD OR SEV (CHILD-PUGH CAT B OR
C), LIVER DISEASE, ACTIVE, INCLUDING UNEXPLAINED
PERSISTANT ELEVATIONS OF SERUM TRANSAMINASES
DIAGNOSIS OF HOMOZYGOUS FAMILIAL
HYPERCHOLESTEROLEMIA, LIVER FUNCTION TESTS
Age Restrictions  N/A
ENDOCRINOLOGIST OR CARDIOLOGIST
6 MO WITH DOCUMENTED CLINICAL RESP TO THERAPY FOR
RENEWAL
Other Criteria  MAY ALSO COVER HETEROZYGOUS FAMILIAL
HYPERCHOLESTEROLEMIA WHEN CORONARY
ARTERIOSCLEROSIS IS PRESENT AND UNCONTROLLED
HYPERCHOLESTEROLEMIA WHEN ALL FORMULARY AGENTS
HAVE BEEN TRIED AND FAILED AT MAXIMUM TOLERATED
DOSES.
Formulary ID 20387 Version 16

91

LAZANDA
Products Affected

•  Lazanda
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
OPIOD NON-TOLERANT PATIENTS
Age Restrictions  N/A
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
1YR AT A TIME
Other Criteria
N/A




Formulary ID 20387 Version 16

92




•  Lemtrada
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
LEMTRADA
Products Affected

Off-Label Uses
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Multiple Sclerosis (MS): Diagnosis of a relapsing form of MS (eg,
relapsing-remitting MS, secondary-progressive MS with relapses,
progressive-relapsing MS with relapses). One of the following: 1) Patient
has not been previously treated with alemtuzumab, and patient has history
of failure following a trial for at least 4 weeks or history of intolerance or
contraindication to 2 of the following: interferon beta-1a (Avonex or
Rebif), interferon beta-1b (Betaseron, Extavia), glatiramer acetate
(Copaxone or Glatopa), dimethyl fumarate (Tecfidera), teriflunomide
(Aubagio), fingolimod (Gilenya), peginterferon beta-1a (Plegridy),
natalizumab (Tysabri), or 2) Patient has previously received treatment
with alemtuzumab, and at least 12 months have or will have elapsed since
the first treatment with alemtuzumab, and patient has not already received
the FDA-recommended lifetime limit of two (2) treatment courses of
alemtuzumab. Patient is not receiving alemtuzumab in combination with
another disease modifying agent (eg, interferon beta preparations,
glatiramer acetate, natalizumab, fingolimod, or teriflunomide).
Age Restrictions  N/A
N/A
Prescriber
Restrictions
Coverage
Duration
Other Criteria
N/A
MS: 12 MONTHS, MAX 2 YRS OF THERAPY
Formulary ID 20387 Version 16

93




LETAIRIS (AMBRISENTAN)
Products Affected

•  Ambrisentan
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
PATIENTS WITH SEVERE HEPATIC DISEASE, SEVERE ANEMIA.
PREGNANT PATIENTS
PREVIOUS MEDICATIONS USED, RESULTS OF ACUTE
VASOREACTIVITY TESTING,
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Age Restrictions  N/A
Other Criteria
N/A
PULMONOLOGIST, CARDIOLOGIST
1 YR AT A TIME
Formulary ID 20387 Version 16

94




LEUKINE
Products Affected

PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
•  Leukine INJ 250MCG
CONCOMITANT USE WITH CHEMOTHERAPY OR
RADIOTHERAPY OR USE WITHIN 24 HOURS
Age Restrictions  N/A
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Other Criteria
6MO AT A TIME
ALLOGENEIC BONE MARROW TRANSPLANTATION, MYELOID
RECONSTITUTION IN HLA-MATCHED RELATED DONORS,
AUTOLOGOUS BONE MARROW TRANSPLANT, MYELOID
RECONSTITUTION FOLLOWING TRANSPLANT IN PATIENTS
WITH NON-HODGKIN'S LYMPHOMA, HODGKIN'S DISEASE, AND
ACUTE LYMPHOBLASTIC LEUKEMIA, BONE MARROW
TRANSPLANT, DELAY OR FAILURE OF MYELOID
ENGRAFTMENT, FEBRILE NEUTROPENIA, IN ACUTE
MYELOGENOUS LEUKEMIA FOLLOWING INDUCTION
CHEMOTHERAPY, PROPHYLAXIS HARVESTING OF
PERIPHERAL BLOOD STEM CELLS, PERIPHERAL BLOOD STEM
CELL GRAFT, AUTOLOGOUS, MYELOID RECONSTITUTION
FOLLOWING TRANSPLANT IN PATIENTS MOBILIZED WITH
GRANULOCYTE MACROPHAGE COLONY STIMULATING
FACTOR.  ALSO BVD DECISIONS
Formulary ID 20387 Version 16

95





LEUPROLIDE
Products Affected
•  Eligard
•  Leuprolide Acetate INJ
•  Lupron Depot (1-month)
•  Lupron Depot (3-month)
•  Lupron Depot (6-month)
•  Lupron Depot-ped (1-month)
•  Lupron Depot-ped (3-month)
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
Age Restrictions  N/A
N/A
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Other Criteria
1YR AT A TIME
LEUPROLIDE ACETATE INJECTION IS INDICATED IN THE
PALLIATIVE TX OF ADVANCED PROSTATE CANCER, TX OF
CHILDREN WITH CENTRAL PRECOCIOUS
PUBERTY,ENDOMETRIOSIS AND UTERINE
LEIOMYOMATA(FIBROIDS).  ALSO BVD DECISIONS
Formulary ID 20387 Version 16

96

LIDOCAINE
Products Affected

PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
•  Lidocaine PTCH 5%
DOCUMENTATION OF POST HERPETIC NEUROPATHY
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Age Restrictions  N/A
1 YR AT A TIME
Other Criteria
N/A




Formulary ID 20387 Version 16

97




LUMIZYME
Products Affected

PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
Off-Label Uses
N/A
•  Lumizyme
N/A
N/A
Exclusion
Criteria
Required
Medical
Information
Prescriber
Restrictions
Coverage
Duration
Age Restrictions  N/A
1YR AT A TIME
Other Criteria
N/A
ENZYME TESTING THAT DEMONSTRATES REDUCED GAA
ENZYME ACTIVITY OR BY DNA TESTING FOR MUTATIONS IN
THE GAA GENE
Formulary ID 20387 Version 16

98

我想删除"必需医疗信息"和"其他标准"部分之间的所有文本。

因此,示例输出(对于前两种药物,Kineret和Korlym(将是 基尼雷特

Products Affected

PA Criteria
Criteria Details
•  Kineret
Indications
All FDA-approved Indications, Some Medically accepted Indications.
Off-Label Uses
JIA
Exclusion
Criteria
Required
Medical
Information
Other Criteria
RA CRITERIA. DOC OF INTOLERANCE OR FAILURE TO
RESPOND TO A 2MO TRIAL OF A DMARD THERAPY, SUCH AS
METHOTREXATE, ARAVA(LEFLUNOMIDE), PLAQUENIL
(HYDROXYCHLOROQUINE), OR SULFASALAZINE AND TRIAL
AND FAILURE WITH HUMIRA AND ENBREL. JIA CRITERA:
INADEQUATE RESP, INTOLERANCE, OR CONTRAINDICATION
TO CORTICOSTEROIDS AND TRIAL AND FAILURE WITH
HUMIRA.
Formulary ID 20387 Version 16

87




•  Korlym
KORLYM
Products Affected

Off-Label Uses
N/A
Exclusion
Criteria
PA Criteria
Criteria Details
Indications
All FDA-approved Indications.
TYPE 2 DIABETES MELLITUS UNRELATED TO ENDOGENOUS
CUSHINGS, PREGNANCY, USE OF SIVASTATIN OR
LOVASTATIN AND CYP3A SUBSTRATES W NARROW
THERAPEUTIC RANGE, CONCURRENT LONGTERM
CORTICOSTEROID USE, WOMEN W HX OF UNEXPLAINED
VAGINAL BLEEDING, WOMEN W ENDOMETRIAL
HYPERPLASIA W ATYPIA OR ENDOMETRIAL CARCINOMA
Required
Medical
Information
Other Criteria
N/A
6MO AT A TIME
Formulary ID 20387 Version 16

88

到目前为止,我的代码是这样的:

# Remove all text between "Required Medical Information" and "Other Criteria"
start_section_header_regexes = [
r'Required[s]Medical[s]Information',
# These sections may appear before Required Medical Information but are inconsistent
#r'Off[s]Label[s]Uses',
#r'Exclusion[s]Criteria',
]
end_section_header_regexes = [
r'Other[s]Criteria',
]
print(input_text)
for start_section in start_section_header_regexes:
for end_section in end_section_header_regexes:
pattern = re.compile(rf'({start_section})[sS]*?({end_section})', re.IGNORECASE)
input_text = re.sub(pattern, r'1 2', input_text)

我已经在在线正则表达式测试器中进行了测试,并看到正则表达式确实正确检测到包含我要保留的文本的两个捕获组:https://regex101.com/r/vgV8ky/2

但是,在运行代码时,input_text仍包含"必需的医疗信息"和"其他条件"之间的文本。我做错了什么?

我应该指出,上面列出的文档可能并不完全代表文本的实际内容。我正在使用 pdf2text 将 PDF 转换为文本,但我认为某些换行符并没有以 PDF 中的确切形式粘贴到浏览器中。

谢谢

我明白了,这是因为"必需的医疗信息"和"其他标准"中的每个单词之间都有不止一个\s字符。我将开始和结束部分正则表达式分别调整为r'Required[s]*?Medical[s]*?Information'r'Other[s]*?Criteria'

最新更新