SED 拆分单行文件并处理生成的行

我在一行中有一个XML提要（这个），所以要提取我需要的数据，我可以做这样的事情：

sed -r 's:<([^>]+)>([^<]+)</1>:&n: g' feed | sed -nr '
    /<item>/, $ s:.*<(title|link|description)>([^<]+)</1>.*:2: p'

因为我找不到一种方法来首先调用sed将结果处理为不同的行。

有什么建议吗？

我的目标是在一次sed调用中获取所需的所有数据

sed -rn -e 's|>[[:space:]]*<|>n<|g
/^<title>/ { bx }
/^<description>/ { b x }
/^<link>/ { bx }
D
:x
s|<([^>]*)>([^n]*)</1>|1=2|;
P
D' rss.xml

新问题的新答案。现在有了分支并输出所有三个信息块。

sed -rn -e 's|>[[:space:]]*<|>n<|g   # Insert newlines before each element
/^[^<]/ D                             # If not starting with <, delete until 1st n and restart
/^<[^t]/ D                            # If not starting with <t, ""
/^<t[^i]/ D                           # If not starting with <ti, ""
/^<ti[^t]/ D
/^<tit[^l]/ D
/^<titl[^e]/ D
/^<title[^>]/ D                       # If not starting with <title>, delete until 1st n and restart
s|^<title>||                          # Delete <title>
s|</title>[^n]*||                    # Delete </title> and everything after it until the newline
P                                     # Print everything up to the first newline
D' rss.xml                            # Delete everything up to the first newline and restart

通过"重新启动"，我的意思是回到 sed 脚本的顶部，假装我们只是阅读剩下的内容。

我学到了很多关于写这篇文章的知识。但是，毫无疑问，您确实应该在perl中执行此操作（如果您是老派，则为awk）。

在perl中，这将是perl -pe 's%.*?<title>(.*?)</title>(?:.*?(?=<title>)|.*)%$1n%g' rss.xml

这基本上是利用最小匹配（.*？是非贪婪的，它将匹配尽可能少的字符数）。最后的积极展望只是为了让我可以在一个表达式中做到这一点，同时仍然删除最后的所有内容。有不止一种方式...

如果您需要此 xml 文件中的多个标记，可能仍然是可能的，但可能涉及分支等。

这个呢：

sed -nr 's|>[[:space:]]*<|>n<|g
    h
    /^<(title|link|description)>/ {
        s:<([^>]+)>([^<]+)</1>:2: P
    }
    g
    D
    ' feed

相关内容

最新更新

热门标签：