我有如下文件:
<div title="alpha" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
我试图获得第n个name="ll"
属性的内容来代替title=
内容,同时保留其余内容的顺序
例如,第二个name="ll"
会让我:
<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
等等。
我的尝试:
find . -type f -exec perl -pi -w -e 's/(title=)"?[^"s]*"?(.*)((?:.*?h+class="ll">){1}.*?)h+class="ll">"?([^"s]+)"?(<.*)/$1"$3"$2$4/' {} ;
我在哪里犯错误?
与其在一次替换中完成所有操作,不如按以下步骤进行:
perl -wpe '$n = 2;
@m = /<span name="ll">([^<]+)/g;
s/title="[^"]+"/title="$m[$n-1]"/;
s:<span name="ll">Q$m[$n-1]E</span> ::;'
即
- 提取所有可以移动的字符串
- 用所需字符串替换标题
- 删除包含所需字符串的跨度
这个perl解决方案应该适合您:
# matching 2nd <span name="ll">
perl -pe 's~(title=)"?[^"s]*"?((?:.*?h+<span name="ll">){1}.*?)h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
# matching 3rd <span name="ll">
perl -pe 's~(title=)"?[^"s]*"?((?:.*?h+<span name="ll">){2}.*?)h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="delta" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
RegEx解释:
解释:
(title=)
:匹配title=
并在组#1中捕获"?[^"s]+"?
:匹配(可选(带引号的非空格字符串(
:启动捕获组#2(?:
:启动非捕获组.*?
:匹配任何文本(延迟匹配(h+
:匹配1+个空白<span name="ll">
:匹配文本<span name="ll">
){1}
:结束非捕获组并重复该组{1}
次.*?
:匹配任意文本(延迟匹配(
)
:结束捕获组#2h+
:匹配1+个空白<span name="ll">
:匹配文本<span name="ll">
([^<]+)
:匹配任何非>
的字符的1+并在组#3中捕获</span>
:匹配</span>
$1"$3"$2
:更换零件