删除第一个字符串之后出现的字符串的任务

我想删除文本文件中任何多次出现的字符串，只留下第一个实例。

起点:

<topichead navtitle="AAAA"><topicref href="____"/></topichead>
<topichead navtitle="AAAA"><topicref href="____"/></topichead>
<topichead navtitle="AAAA"><topicref href="____"/></topichead>
<topichead navtitle="AAAA"><topicref href="____"/></topichead>
<topichead navtitle="BBBB"><topicref href="____"/></topichead>
<topichead navtitle="BBBB"><topicref href="____"/></topichead>
<topichead navtitle="BBBB"><topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>

预期的结果:

<topichead navtitle="AAAA"><topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
<topichead navtitle="BBBB"><topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>
                           <topicref href="____"/></topichead>

之后我必须去掉</topichead>的大部分实例，但是一旦我得到了第一部分，这些将很容易匹配并删除

根据我在这页上看到的东西，我写了这个:

 <replaceregexp byline="false" flags="g">
     <regexp pattern="(&lt;topichead.*&gt;)(r?n1)+"/>
     <substitution expression="/1"/>
     <fileset dir=".">
     <include name="*.txt"/>
     </fileset>
   </replaceregexp>

然而，它不工作。作为测试，如果我从regexp模式中删除(r?n1)+，只是匹配(<topichead.*>)的所有实例，并简单地将其替换为XXX或其他任何内容，那就可以了。所以我知道它们是正确连接起来的。我也为第二组尝试了(1)+，但到目前为止，对于上面的目标没有任何作用。欢迎提出任何意见。

用一个更好的例子来更新这个，我给的那个例子有点不精确:我需要做的确切地说是这样的:

起点:

<topichead navtitle="AAAA"><topicref href="XYZ"/></topichead>
<topichead navtitle="AAAA"><topicref href="ZYX"/></topichead>
<topichead navtitle="AAAA"><topicref href="XXYYZZ"/></topichead>
<topichead navtitle="AAAA"><topicref href="YYYY"/></topichead>
<topichead navtitle="BBBB"><topicref href="ZZZYXZ"/></topichead>
<topichead navtitle="BBBB"><topicref href="yyYYZZXX"/></topichead>
<topichead navtitle="BBBB"><topicref href="XX"/></topichead>
<topichead navtitle="CCCC"><topicref href="YYZ"/></topichead>
<topichead navtitle="CCCC"><topicref href="ZZY"/></topichead>
<topichead navtitle="CCCC"><topicref href="XXZZY></topichead>
<topichead navtitle="CCCC"><topicref href="ZZZ"/></topichead>
<topichead navtitle="CCCC"><topicref href="YYYZZXX"/></topichead>

预期的结果:

<topichead navtitle="AAAA">
<topicref href="XYZ"/>
<topicref href="ZYX"/>
<topicref href="XXYYZZ"/>
<topicref href="YYYY"/></topichead>
<topichead navtitle="BBBB">
<topicref href="ZZZYXZ"/>
<topicref href="yyYYZZXX"/>
<topicref href="XX"/>
<topicref href="YYZ"/></topichead>
<topichead navtitle="CCCC">
<topicref href="ZZY"/>
<topicref href="XXZZY>
<topicref href="ZZZ"/>
<topicref href="YYYZZXX"/></topichead>

"XXYYZZ"是所有不同(或可能不同)的链接，需要保留。

困难的部分是在的第一个实例之后摆脱重复项，例如<topichead navtitle="AAAA">

如果我能得到这个结果，作为第一步:

<topichead navtitle="AAAA"><topicref href="XYZ"/></topichead>
                           <topicref href="ZYX"/></topichead>
                           <topicref href="XXYYZZ"/></topichead>
                           <topicref href="YYYY"/></topichead>
<topichead navtitle="BBBB"><topicref href="ZZZYXZ"/></topichead>
                           <topicref href="yyYYZZXX"/></topichead>
                           <topicref href="XX"/></topichead>
<topichead navtitle="CCCC"><topicref href="YYZ"/></topichead>
                           <topicref href="ZZY"/></topichead>
                           <topicref href="XXZZY></topichead>
                           <topicref href="ZZZ"/></topichead>
                           <topicref href="YYYZZXX"/></topichead>

然后我可以很容易地摆脱不需要的尾随</topichead>条目，使用:

 <replaceregexp byline="false" flags="gs">
 <regexp pattern="&lt;/topichead&gt;rn&lt;topicref"/>
 <substitution expression="${line.separator}&lt;topicref"/>
 <fileset dir=".">
 <include name="*.txt"/>
 </fileset>
 </replaceregexp>

…并得到如上所示的预期结果。

我现在这样做，使用搜索和替换的第一步，然后用那个replaceregexp。我有很多这样的事情要做，所以如果能把它们自动化就太好了。

我看过很多建议，这些建议基本上都是使用这个作为核心(r?n1)的变化，以不同的方式，但没有运气得到任何我需要的东西，

听了你的更新，我明白你的意思了。它似乎是您原始输入的一行:

<topichead navtitle="CCCC"><topicref href="XXZZY></topichead>

可能是:

<topichead navtitle="CCCC"><topicref href="XXZZY"/></topichead>

则解如下:

    <target name="test2">
        <replaceregexp byline="false" flags="gs">
     <regexp pattern="(&lt;topicheads+navtitle=&quot;[^&quot;]*&quot;&gt;)(&lt;topicrefs+href=&quot;[^&quot;]*&quot;/&gt;)&lt;/topichead&gt;(?=.*1)"/>
     <substitution expression="2"/>
     <fileset dir=".">
        <include name="*.txt"/>
     </fileset>
   </replaceregexp> 
    </target>
    <target name="test" depends="test2">
        <replaceregexp byline="false" flags="gs">
     <regexp pattern="(&lt;topicref.*?)(&lt;topicheads+navtitle=&quot;[^&quot;]*&quot;&gt;)(&lt;topicrefs+href=&quot;[^&quot;]*&quot;/&gt;&lt;/topichead&gt;)"/>
     <substitution expression="2${line.separator}13"/>
     <fileset dir=".">
        <include name="*.txt"/>
     </fileset>
   </replaceregexp> 
    </target>

运行ant test后:
您将得到您想要的结果，如下所示:

<topichead navtitle="AAAA">
<topicref href="XYZ"/>
<topicref href="ZYX"/>
<topicref href="XXYYZZ"/>
<topicref href="YYYY"/></topichead>
<topichead navtitle="BBBB">
<topicref href="ZZZYXZ"/>
<topicref href="yyYYZZXX"/>
<topicref href="XX"/></topichead>
<topichead navtitle="CCCC">
<topicref href="YYZ"/>
<topicref href="ZZY"/>
<topicref href="XXZZY"/>
<topicref href="ZZZ"/>
<topicref href="YYYZZXX"/></topichead>

一个示例:

   <replaceregexp byline="false" flags="g">
     <regexp pattern="(&lt;topichead.*&gt;)(?=r?n1)"/>
     <substitution expression="&lt;topicref href=&quot;____&quot;/&gt;&lt;/topichead&gt;"/>
     <fileset dir=".">
        <include name="*.txt"/>
     </fileset>
   </replaceregexp>

输出如下:

<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topichead navtitle="AAAA"><topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topichead navtitle="BBBB"><topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topicref href="____"/></topichead>
<topichead navtitle="CCCC"><topicref href="____"/></topichead>

结果只留下最后一个实例，而不是第一个实例。通知你。

相关内容

最新更新

热门标签：