批处理文件,用于提取许多.xml文件的 xml 标记的值



我需要有关批处理文件优化的帮助,以便将超过一千个 xml 文件的多个 xml 标签放入.txt或.csv中。

.xml的格式都相同。它们是临床研究,看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<clinical_study rank="373">
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on May 25, 2017</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00146471</url>
</required_header>
<id_info>
<org_study_id>Kep-F10.3.01</org_study_id>
<nct_id>NCT00146471</nct_id>
</id_info>
<brief_title>Efficacy and Safety of Levetiracetam in the Inpatient Treatment of Alcohol Withdrawal Syndrome</brief_title>
<official_title>Efficacy and Safety of Levetiracetam in the Inpatient Treatment of Alcohol Withdrawal Syndrome [Sicherheit Und Wirksamkeit Von Levetiracetam (Keppra) für Die Behandlung Des stationären Alkoholentzugsyndroms]</official_title>
<sponsors>
<lead_sponsor>
<agency>Charite University, Berlin, Germany</agency>
<agency_class>Other</agency_class>
</lead_sponsor>
</sponsors>
<source>Charite University, Berlin, Germany</source>
<oversight_info>
<has_dmc>Yes</has_dmc>
</oversight_info>
<brief_summary>
<textblock>
The purpose of this study is to evaluate the efficacy and safety of levetiracetam for
treating alcohol withdrawal syndrome (AWS) in inpatients (vs. placebo). The primary come-out
parameter is the reduction of the total needed amount of diazepam for add-on treatment of
acute alcohol withdrawal symptoms. The secondary come-out parameter are - safety criteria
(AE) - reduction of alcohol withdrawal score over the days.
</textblock>
</brief_summary>
<overall_status>Completed</overall_status>
<start_date>January 2006</start_date>
<completion_date type="Actual">September 2007</completion_date>
<primary_completion_date type="Actual">July 2007</primary_completion_date>
<phase>Phase 3</phase>
<study_type>Interventional</study_type>
<has_expanded_access>No</has_expanded_access>
<study_design_info>
<allocation>Randomized</allocation>
<intervention_model>Parallel Assignment</intervention_model>
<primary_purpose>Treatment</primary_purpose>
<masking>Double Blind (Participant, Care Provider, Investigator)</masking>
</study_design_info>
<primary_outcome>
<measure>To evaluate the efficacy and safety of levetiracetam for treating alcohol withdrawal syndrome in inpatients. The primary come-out parameter is the reduction of the amount of diazepam for add-on treatment of acute alcohol withdrawal</measure>
<time_frame>during trial</time_frame>
</primary_outcome>
<secondary_outcome>
<measure>Secondary come-out parameters are - safety criteria (AE) - reduction of alcohol withdrawal score over the days</measure>
<time_frame>during trial</time_frame>
</secondary_outcome>
<number_of_arms>2</number_of_arms>
<enrollment type="Actual">120</enrollment>
<condition>Alcohol Withdrawal Syndrome</condition>
<arm_group>
<arm_group_label>2</arm_group_label>
<arm_group_type>Active Comparator</arm_group_type>
</arm_group>
<arm_group>
<arm_group_label>1: Diazepam plus Placebo</arm_group_label>
<arm_group_type>Placebo Comparator</arm_group_type>
</arm_group>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Levetiracetam</intervention_name>
<description>1500-2000 mg daily add-on or Placebo Diazepam as needed</description>
<arm_group_label>2</arm_group_label>
<other_name>KEPPRA</other_name>
</intervention>
<intervention>
<intervention_type>Drug</intervention_type>
<intervention_name>Placebo</intervention_name>
<description>1500-2000 mg daily add-on or Placebo Diazepam as needed</description>
<arm_group_label>1: Diazepam plus Placebo</arm_group_label>
</intervention>
<eligibility>
<criteria>
<textblock>
Inclusion Criteria:
-  Ages eligible for study: 18-65 years.
-  Meets criteria for alcohol dependence according to DSM-IV/ICD-10
-  Known withdrawal symptoms in the past in case of discontinuation of alcohol
consumption
-  Hospital admission for alcohol detoxification
-  Able to provide a written informed consent.
-  Able to follow verbal and written instructions (incl. a sufficient knowledge of
German language).
-  Must be medically acceptable for study treatment. No past or present physical
disorder that is likely to deteriorate during participation. No ECG abnormality which
would likely worsen during participation and no clinical laboratory abnormality that
would also suggest deterioration during treatment.
-  Have a negative urine drug screen for benzodiazepines or heroine or methadone
Exclusion Criteria:
-  Current diagnosis of any other substance dependence syndrome other than alcohol
dependence (excluding nicotine and caffeine dependence).
-  History of idiopathic epilepsy.
-  Patient with any current clinically significant psychiatric disorder (acute
suiciality) or developmental disorder (including organic mental disorder), like
psychotic disorders.
-  Patients with the following complications of alcoholism (lifetime): acute delirium
tremens, hallucinatory alcoholic state, Korsakoff`s syndrome, Wernicke
encephalopathy, decomposed liver cirrhosis (Child B, C), suspected cirrhosis with the
following clinical symptoms detected at clinical exam: signs of portal hypertension
and signs of hepato-cellular failure, thrombocytopenia.
-  Subjects with known sensitivity of previous adverse reaction to levetiracetam
-  Contra-indication (hypersensitivity to levetiracetam or pyrrolidone derivatives) or
known non-response to levetiracetam.
-  History of severe GI disease which might render absorption of the medication
difficult or produce medical instability of the patient which would include active
peptic ulcer disease, ulcerative colitis, regional colitis, or evidence by history or
physical exam of GI bleeding.
-  Patients with any clinically significant acute or chronic progressive neurological,
gastrointestinal, cardiovascular, hepatic, renal, haematological, endocrine,
dermatological or respiratory disease, such as diabetes, severe infection, acute
alcoholic hepatitis, or any other medical condition with significant worsening of the
clinical situation of the patient that might interfere with the evaluation of study
medication.
-  Female patients pregnant, breast-feeding or of child bearing age and not protected by
effective contraceptive such as implants, injectables, combined oral contraceptives,
some IUDS, sexual abstinence, sterilization or vasectomized partner.
-  Actually continuous use of pharmacological agents that are known to lower the seizure
threshold or augment or decrease the alcohol withdrawal syndrome.
-  Subjects with known sensitivity of previous adverse reaction to diazepam or clonidine
-  Contra-indication or known non-response to diazepam or clonidine
</textblock>
</criteria>
<gender>All</gender>
<minimum_age>18 Years</minimum_age>
<maximum_age>65 Years</maximum_age>
<healthy_volunteers>No</healthy_volunteers>
</eligibility>
<overall_official>
<last_name>Martin Schaefer, MD</last_name>
<role>Principal Investigator</role>
<affiliation>Charité Campus Mitte, Klinik für Psychiatrie und Psychotherapie</affiliation>
</overall_official>
<location>
<facility>
<name>MLU Halle-Wittenberg</name>
<address>
<city>Halle</city>
<state>Sachen/Anhalt</state>
<zip>06097</zip>
<country>Germany</country>
</address>
</facility>
</location>
<location>
<facility>
<name>Charité - Universitätsmedizin Berlin, Campus Charité Mitte, Klinik für Psychiatrie und Psychotherapie</name>
<address>
<city>Berlin</city>
<zip>10117</zip>
<country>Germany</country>
</address>
</facility>
</location>
<location>
<facility>
<name>Psychiatrische Klinik der Charité im St.-Hedwig Krankenhaus</name>
<address>
<city>Berlin</city>
<zip>10559</zip>
<country>Germany</country>
</address>
</facility>
</location>
<location>
<facility>
<name>Klinik für Psychiatrie und Suchtmedizin, Kliniken Essen Mitte</name>
<address>
<city>Essen</city>
<zip>45136</zip>
<country>Germany</country>
</address>
</facility>
</location>
<location>
<facility>
<name>Zentrum für Seelische Gesundheit</name>
<address>
<city>Rhede</city>
<zip>46414</zip>
<country>Germany</country>
</address>
</facility>
</location>
<location_countries>
<country>Germany</country>
</location_countries>
<reference>
<citation>Krebs M, Leopold K, Richter C, Kienast T, Hinzpeter A, Heinz A, Schaefer M. Levetiracetam for the treatment of alcohol withdrawal syndrome: an open-label pilot trial. J Clin Psychopharmacol. 2006 Jun;26(3):347-9.</citation>
<PMID>16702910</PMID>
</reference>
<verification_date>September 2008</verification_date>
<lastchanged_date>December 29, 2009</lastchanged_date>
<firstreceived_date>September 6, 2005</firstreceived_date>
<responsible_party>
<name_title>Martin Schaefer, MD</name_title>
<organization>Charite University, Berlin, Germany</organization>
</responsible_party>
<keyword>alcohol withdrawal</keyword>
<keyword>detoxification</keyword>
<keyword>Inpatients</keyword>
<keyword>alcohol dependence according to DSM-IV/ICD-10</keyword>
<keyword>withdrawal symptoms</keyword>
<condition_browse>
<!-- CAUTION:  The following MeSH terms are assigned with an imperfect algorithm  -->
<mesh_term>Syndrome</mesh_term>
<mesh_term>Substance Withdrawal Syndrome</mesh_term>
</condition_browse>
<intervention_browse>
<!-- CAUTION:  The following MeSH terms are assigned with an imperfect algorithm  -->
<mesh_term>Ethanol</mesh_term>
<mesh_term>Diazepam</mesh_term>
<mesh_term>Etiracetam</mesh_term>
<mesh_term>Piracetam</mesh_term>
</intervention_browse>
<!-- Results have not yet been posted for this study                                -->
</clinical_study>

所以它们都使用相同的标签,我需要其中一些,例如:

  • overall_official
  • lead_sponsor
  • official_title
  • results_reference
  • overall_status

到目前为止,我尝试使用以下代码:

@echo off
setlocal enabledelayedexpansion
for %%a in (*.xml) do (
call :XMLExtract "%%a" "<results_reference>" location
echo.!location!,%%~na
)
exit /b
:XMLExtract file keystart location
@echo off & setlocal
for /f "tokens=3 delims=<>" %%a in ('Findstr /i /c:%2 "%~1"') do (
set "loc=%%a" & goto :endloop
)
:endLoop
ENDLOCAL & IF "%~3" NEQ "" (SET %~3=%loc%) ELSE echo.%loc%
exit /b

我在命令行中将批处理运行为:bat>>输出.txt或输出.csv它非常适合overall_status,但所有其他标签都存在问题,例如:

  • overall_offical:大约 10 次后停止
  • 其他标记:列出文件名(与往常一样),但后面没有信息。

我将非常感谢有关如何解决此问题或有效解决此任务的另一种方法的任何帮助。我对编程只有一点基本的了解,但我确信能够将自己工作到任何简单的解决方案中。最好的帮助是优化批处理代码以适应这种情况的方法。如果缺少一些信息,我很抱歉,我会提供。

@ECHO Off
SETLOCAL
SET "sourcedir=U:sourcedir"
SET "destdir=U:destdir"
:: SET "tags=overall_official lead_sponsor official_title results_reference overall_status"
SET "tags=%*"
FOR /f "tokens=1delims=" %%a IN (
'dir /b /a-d "%sourcedir%*.xml" '
) DO (
REM Clear detected-tags flags for each file "%%a"
FOR %%t IN (%tags% malformed) DO SET "%%t="
REM remove "rem" from following line to delete any existing result file
REM del "%destdir%%%~na.txt" >nul 2>nul
REM Read each line to %%L - usebackq to allow "quoted filenames"
FOR /f "usebackqdelims=" %%L IN ("%sourcedir%%%a") DO (
REM remove leading spaces from %%L into %%P
FOR /f "tokens=*" %%P IN ("%%L") DO (
REM tokenise on "<>"
FOR /f "tokens=1-3*delims=<>" %%w IN ("%%P") DO (
IF "%%z" neq "" SET "malformed=%%z"
FOR %%t IN (%tags%) DO IF "%%w"=="%%t" (SET "%%t=Y") else IF "%%w"=="/%%t" (SET "%%t=") 
SET "report="
FOR %%t IN (%tags%) DO IF DEFINED %%t SET "report=Y"
REM (1 of 2) un-rem this to deposit in individual filenames
REM (
IF DEFINED report (
REM we may have 1,2 or 3 tokens
REM if 3, output token 2
REM if 2, output token 1 if token 2 starts "/", token 2 otherwise
REM if only 1, output entire line unless it is a target token
IF "%%y" equ "" (
IF "%%x" equ "" (
REM only one token
FOR %%t IN (%tags%) DO IF "%%w"=="%%t" (SET "report=") else IF "%%w"=="/%%t" (SET "report=") 
IF DEFINED report ECHO %%L
) ELSE (
REM two tokens
ECHO %%x|FINDSTR /b "/">NUL 2>NUL
IF ERRORLEVEL 1 (ECHO %%x) ELSE (ECHO %%w)
)
) ELSE (ECHO %%x)
)
REM (2 of 2) un-rem this to deposit in individual filenames
REM )>>"%destdir%%%~na.txt"
FOR %%t IN (%tags%) DO IF "%%y"=="/%%t" (SET "%%t=") 
FOR %%t IN (%tags%) DO IF "%%x"=="/%%t" (SET "%%t=") 
)
REM pause
)
)
)
GOTO :EOF

您需要更改sourcedirdestdir的设置以适合您的情况。

这可能会给你一些想法。您尚未提供输出示例,因此您可能希望在每个输出行前面加上相应echo的源文件名(以%%~na为单位)

预期运行的语法:

此批次名称标签标签标签

我的方法是%%a包含要处理的文件名,%%L文件中的原始行数据,并%%P剥离前导空格的原始行数据。

使用分隔符标记%%P会产生%%W%%z,因为每行包含 1-3 个可能的元素 - 标签或数据。如果有第四个,那么有问题(为文件设置了标志malformed,尽管我没有对它做任何事情 - 它将包含问题所在的文本[也可以设置为整行%%P...])

因此,使用 required-tag 作为变量名,只需将这些 varname 设置为某物,并使用if defined来解释它们的状态 - 随着数据逐行更改,它们的运行时状态会运行。

请注意,由于代码的整个操作部分是一个庞大的代码块,因此必须使用rem而不是::来提供有用的注释。

另请注意

(
commands
)>file

将根据指定的重定向器重定向commands的输出(如果需要)

尝试使用xpath.bat:

for /f "tokens=* delims=" %%# in ('xpath.bat "study.xml" "//reference/citation"') do set "reference_citation=%%#"
echo %reference_citation%
for /f "tokens=* delims=" %%# in ('xpath.bat "study.xml" "//official_title"') do set "official_title=%%#"
echo %official_title%
for /f "tokens=* delims=" %%# in ('xpath.bat "study.xml" "//lead_sponsor/agency"') do set "lead_sponsor=%%#"
echo %lead_sponsor%
for /f "tokens=* delims=" %%# in ('xpath.bat "study.xml"  "//overall_official"') do set  "overall_official=%%#"
echo %overall_official%

最新更新