我根据自己的喜好,用awk开发了一个脚本,将tex文档转换为html。
#!/bin/awk -f
BEGIN {
FS="n";
print "<html><body>"
}
# Function to print a row with one argument to handle either a 'th' tag or 'td' tag
function printRow(tag) {
for(i=1; i<=NF; i++) print "<"tag">"$i"</"tag">";
}
NR>1 {
[conditions]
printRow("p")
}
END {
print "</body></html>"
}
正如所看到的,它正处于一个非常年轻的发展阶段。
documentclass[a4paper, 11pt, titlepage]{article}
usepackage{fancyhdr}
usepackage{graphicx}
usepackage{imakeidx}
[...]
begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
end{document}
我想要的是,脚本只解释begin{document}
和end{document}
之间的行,因为在它们之前是库、变量等的导入;目前我不感兴趣。
我如何使它只处理该模式中的文本?
GNUAWK
具有名为Range的功能。当您提供两个被,
剪切的条件时,操作将仅应用于具有这些条件的行(包括这些行(之间,请考虑以下简单示例,让file.txt
内容为
junk
begin{document}
desired text
more desired text
end{document}
more junk
然后
awk '$0=="\begin{document}",$0=="\end{document}"{print}' file.txt
给出输出
begin{document}
desired text
more desired text
end{document}
(在gawk 4.2.1中测试(
使用正则表达式设置一个标志,然后根据该标志进行打印:
awk '/^\begin{document}/{flag=1}
flag
/^\end{document}/{flag=0}' file
打印开始和结束字符串之间的所有内容,包括:
begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
end{document}
如果您只想要介于之间的文本,而不包括开始和结束字符串:
awk '
/^\begin{document}/{flag=1; next}
/^\end{document}/{flag=0}
flag' file
打印:
# leading blank line printed...
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
# ending blank line printed...