我有一个值为的文件1
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
我想使用linux脚本或命令(sed、grep或awk(获取文件的内容。示例输出:
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%/20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/%2/20Buck%20%282019%30
我的代码:
grep -oP 'href="([^".]*)">([^</.]*)' file1
请帮帮我,我是新手:(
$ awk -v RS='<[^>]+>' 'NF{printf "%s", $0 (++c%2?" |":ORS)}' file
stick man (2020)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/
python easy (2019)/ | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/
注意,正斜杠在原始数据中
需要多字符RS
支持(GNU awk(。
此
<action>
<row>
<column name="book" label="book">stick man (2020)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30/</column>
</row>
<row>
<column name="book" label="book">python easy (2019)/</column>
<column name="referensi" label="referensi"> http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30/</column>
</row>
</action>
看起来确实像一个HTML文件。如果你被允许在你的系统中安装实用程序,我建议你尝试hxselect,当你想提取一些可以用CSS语言描述的东西时,它很有用。例如,从file.html
:获取label
为referensi
的所有column
的内容
cat file.html | hxselect -i -c -s 'n' column[label=referensi]
使用awk
,您可以尝试:
awk -F'>|/<' '{ORS= (NR == 3 || NR == 7) ? " |" : "n"} $2 != "" {print $2}' file
stick man (2020) | http://172.22.215.234/Data/Book/Journal/2016_2020/1%20Stick%20%282020%30
python easy (2019) | http://172.22.215.234/Data/Book/Journal/2016_2020/2%20Buck%20%282019%30
- 或更短:
awk -F'>|/<' '{ORS= (NR%2) ? " |" : RS} $2 != "" {print $2}' file