使用REGEX捕获SGM标记之间的文本



我试图使用正则表达式来捕获最后一个</tabmat>标记和最后一个</para>标记之间的文本。我尝试使用</tgroup></tbody></tabmat>.*?</para>(<tabmat frame=".*?".*?>(?:[sS](?!</tabmat))+?Input Conditions[sS]+?)[</tabmat>]*(?=</para>),但没有工作。它选择第一个tabmat标记之间的所有文本,并继续到第一个</para标记的末尾。如果查看XML测试示例,就会发现打开了一个<warning标记,并且有多个<para标记。regex选择到第一个</para标记的文本。但没有捕获最后一个</para.

的例子:</tbody></tgroup></tabmat>End Text</para>

正则表达式:(<tabmat frame=".*?".*?>(?:[sS](?!</tabmat))+?Input Conditions[sS]+?)[</tabmat>]*(?=</para>)

我不知道我应该使用什么正则表达式。谢谢你的帮助。

XML示例:

<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="2" align="left">
<colspec colname="col1" align="left" colwidth="0.99in">
<colspec colname="col2" colwidth="0.78*">
<spanspec namest="col1" nameend="col2" spanname="span1">
<tbody>
<row>
<entry spanname="span1" colsep="0" align="left">
<emphasis type="u"> Input Conditions</emphasis>.</entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="2" align="left">
<colspec colname="col1" align="left" colwidth="1.0in">
<colspec colname="col2" colwidth="0.78*">
<spanspec namest="col1" nameend="col2" spanname="span1">
<tbody>
<row>
<entry spanname="span1" colsep="1" align="left">
<emphasis type="b">Applicability: </emphasis> All</entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="1">
<colspec colname="col1" colwidth="1.00*">
<tbody>
<row>
<entry>
<emphasis type="b">Required Conditions:</emphasis></entry></row>
<row valign="top">
<entry>
<randlist>
<item>r  oGiyelC rfreSponguiVdyeps rd (reef rt o<emphasis type="u" color="blue">
<xref xref="test ref"></emphasis>).</item>
<item> ScfChosmer kC Geot nsuren ono ctiawdt ill n soexisipvrtt e hnethe  nctneqeucarnttaeironedi am inogmfr p ibereofmrde.</item>
<item>erh prossa  nipEsvunvbla rieaaetaxle eitinaedthte er gfn inumia .aecenist </item></randlist></entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="1">
<colspec colname="col1" colwidth="1.00*">
<tbody>
<row>
<entry>
<emphasis type="b">Personnel Recommended:</emphasis> 2 </entry></row>
<row>
<entry>
<randlist>
<item>cihTnieca  enAprformGs Ca oSprteoirnnuo fct</item>
<item>teisTchncsii eca TBnnsaihsa inc A</item></randlist></entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="2">
<colspec colname="colspec0" align="left">
<colspec colname="colspec1">
<spanspec namest="colspec0" nameend="colspec1" spanname="span1">
<tbody>
<row>
<entry spanname="span1" colsep="1" align="left">
<emphasis type="b">Support Equipment:</emphasis> None</entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" colsep="0" pgwide="0">
<tgroup cols="2">
<colspec colname="colspec0" align="left">
<colspec colname="colspec1">
<spanspec namest="colspec0" nameend="colspec1" spanname="span1">
<tbody>
<row>
<entry spanname="span1" colsep="1" align="left">
<emphasis type="b">Consumables: </emphasis> None</entry></row></tbody></tgroup></tabmat>
<tabmat frame="none" pgwide="0">
<tgroup cols="1">
<colspec colname="col1">
<tbody>
<row>
<entry>
<emphasis type="b">Safety Conditions:</emphasis>
<warning>
<para>ctdesajTehu inshsevtetnarroorohfedln yl crt . oswoi uano ipko .reuFacg.ielat   drzetod tl ciT  inedcuseer uc lori epiu elyosyloif mropsh,l  sonawart elmr o eccoaolhrimmc edzhpu orodmpnyennhatsd a rn neeergi</para>
<para>ePfrrom pproer <acronym><def>Oto  Luk cTt ag uO</def><term>OLTO</term></acronym>yn  ncsaerse oictrbsr er ueicoakr <acronym>
<def>Emergecny oP ewrOff</def><term>EPO</term></acronym>rc pew ricsploihe npct lcdcioba.uaadehatn rscel a lwi</para><para> rnieidoFoa.to o tpalnueenyengstp ecm rsiorulol  oro cee qeod tdrtrydeacma ero /nirlcac  ursenlsnpdeoalr howpgtnirtr nre  upprori etf mm onoiena rtmii vjauaitellsat</para></warning></entry></row></tbody></tgroup></tabmat>TEST</para>

Perl regexp在ue28.20.0.70中对您的示例进行了测试

(?s).*</tabmat>K.*(?=</para>)

最新更新