如何在linux命令/脚本上使用sed、awk或grep从文件中获取值



我有一个值为的文件1

<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>

我想使用linux脚本或命令(sed、grep或awk(获取文件的内容。示例输出:

stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30

我的代码:

grep -oP 'href="([^".]*)">([^</.]*)' file1

请帮帮我,我是新手:(

$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/

注意,正斜杠在原始数据中

需要多字符RS支持(GNU awk(。

<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>

看起来确实像一个HTML文件。如果你被允许在你的系统中安装实用程序,我建议你尝试hxselect,当你想提取一些可以用CSS语言描述的东西时,它很有用。例如,从file.html:获取labelreferensi的所有column的内容

cat file.html | hxselect -i -c -s 'n' column[label=referensi]

使用awk,您可以尝试:

awk -F'>|/<'  '{ORS= (NR == 3 || NR == 7) ? " |" : "n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
  • 或更短:
awk -F'>|/<'  '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file

最新更新