我有一个bash脚本,它可以迭代许多文件:f1.gz, f2.gz, .. fn.gz
每个文件包含数百万行,每行都可以匹配一个模式:p1, p2, .. pn
根据这一点,匹配的行应该指向一个特定的文件。通过CCD_ 3操作获得图案。
我写了几个相同的版本,但我一点也不满意,我想问是否可以在不重复使用编译语言编写任何内容的情况下实现任何更好的方法/解决方案。
这是我的:
for FILE in `ls f*.gz`
do
echo "uncompressing only once per file -- $FILE: "
gzcat $FILE > .myfile.txt
while IFS='' read -r LINE || [[ -n "$LINE" ]]; do
for DATE in "$@" # I pass to my script several dates like 20201015, 20201014, etc
do
for i in {0..23};
do
p="DATE_PATTERNS_$DATE[$i]" # I prepared these before to avoid running "date" millions of times
echo $LINE | awk -v pat=${!p} -F '"' '$1 ~ pat {print $2" "$4" "$6}' >> $DATE.txt
done
done
done < .myfile.txt
done
感谢
当您不想用一个循环日期的awk
来替换代码时,您可以从删除while
开始(并减少打开输出文件的频率(:
for FILE in f*.gz; do
echo "uncompressing only once per file -- $FILE: "
gzcat $FILE > .myfile.txt
# I pass to my script several dates like 20201015, 20201014, etc
for DATE in "$@"; do
for i in {0..23};
do
p="DATE_PATTERNS_$DATE[$i]"
awk -v pat=${!p} -F '"' '$1 ~ pat {print $2" "$4" "$6}' .myfile.txt
done
done >> $DATE.txt
done
如果您仍然尝试过并且仍然需要改进,请考虑将for DATE
和for i
移动到awk
和/或启动gzcat f*gz > .mycombinedfiles.txt
(当磁盘空间没有问题时(。