Perl 正则表达式结合了捕获组和第 n 个字符串



我有如下文件:

<div title="alpha" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>

我试图获得第n个name="ll"属性的内容来代替title=内容,同时保留其余内容的顺序

例如,第二个name="ll"会让我:

<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>

等等。


我的尝试:

find . -type f -exec perl -pi -w -e 's/(title=)"?[^"s]*"?(.*)((?:.*?h+class="ll">){1}.*?)h+class="ll">"?([^"s]+)"?(<.*)/$1"$3"$2$4/' {} ;

我在哪里犯错误?

与其在一次替换中完成所有操作,不如按以下步骤进行:

perl -wpe '$n = 2;
@m = /<span name="ll">([^<]+)/g;
s/title="[^"]+"/title="$m[$n-1]"/;
s:<span name="ll">Q$m[$n-1]E</span> ::;' 

  1. 提取所有可以移动的字符串
  2. 用所需字符串替换标题
  3. 删除包含所需字符串的跨度

这个perl解决方案应该适合您:

# matching 2nd <span name="ll">
perl -pe 's~(title=)"?[^"s]*"?((?:.*?h+<span name="ll">){1}.*?)h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
# matching 3rd <span name="ll">
perl -pe 's~(title=)"?[^"s]*"?((?:.*?h+<span name="ll">){2}.*?)h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="delta" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>

RegEx解释:

解释:

  • (title=):匹配title=并在组#1中捕获
  • "?[^"s]+"?:匹配(可选(带引号的非空格字符串
  • (:启动捕获组#2
    • (?::启动非捕获组
      • .*?:匹配任何文本(延迟匹配(
      • h+:匹配1+个空白
      • <span name="ll">:匹配文本<span name="ll">
    • ){1}:结束非捕获组并重复该组{1}
    • .*?:匹配任意文本(延迟匹配(
  • ):结束捕获组#2
  • h+:匹配1+个空白
  • <span name="ll">:匹配文本<span name="ll">
  • ([^<]+):匹配任何非>的字符的1+并在组#3中捕获
  • </span>:匹配</span>
  • $1"$3"$2:更换零件

最新更新