根据AWK脚本中的模式处理文本的特定部分



我根据自己的喜好,用awk开发了一个脚本,将tex文档转换为html。

#!/bin/awk -f
BEGIN {
FS="n";
print "<html><body>"
}
# Function to print a row with one argument to handle either a 'th' tag or 'td' tag
function printRow(tag) {
for(i=1; i<=NF; i++) print "<"tag">"$i"</"tag">";
}
NR>1 {
[conditions]
printRow("p")
}
END {
print "</body></html>"
}

正如所看到的,它正处于一个非常年轻的发展阶段。

documentclass[a4paper, 11pt, titlepage]{article}
usepackage{fancyhdr}
usepackage{graphicx}
usepackage{imakeidx}
[...]
begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
end{document}

我想要的是,脚本只解释begin{document}end{document}之间的行,因为在它们之前是库、变量等的导入;目前我不感兴趣。

我如何使它只处理该模式中的文本?

GNUAWK具有名为Range的功能。当您提供两个被,剪切的条件时,操作将仅应用于具有这些条件的行(包括这些行(之间,请考虑以下简单示例,让file.txt内容为

junk
begin{document}
desired text
more desired text
end{document}
more junk

然后

awk '$0=="\begin{document}",$0=="\end{document}"{print}' file.txt

给出输出

begin{document}
desired text
more desired text
end{document}

(在gawk 4.2.1中测试(

使用正则表达式设置一个标志,然后根据该标志进行打印:

awk '/^\begin{document}/{flag=1} 
flag
/^\end{document}/{flag=0}' file

打印开始和结束字符串之间的所有内容,包括:

begin{document}
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
end{document}

如果您只想要介于之间的文本,而不包括开始和结束字符串:

awk '
/^\begin{document}/{flag=1; next} 
/^\end{document}/{flag=0}
flag' file

打印:

# leading blank line printed...
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
# ending blank line printed...

相关内容

  • 没有找到相关文章

最新更新