从从文件中提取模式的 shell 脚本中消除临时文件使用



我有一个输入文本文件:

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL      15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L       15JAN2016 
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.H       15JAN2016
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.S       15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016 
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD            15JAN2016

给定样本数据达到 dot(.) 的最大级别,我们需要唯一类型的 1 个代表性样本(完整行),没有日期。所以输出将是

EL.EEX.FRANCE.DELMONTHS.JAN2016.SPOT.VOL
EL.EEX.GERMANY.DELMONTHS.JAN2016.SPOT.L
EL.EEX.ITALY.DELMONTHS.JAN2016.FWD

(输出中行的顺序无关紧要。

下面的程序工作正常,但它会生成许多中间临时文件。在壳中没有它,我们怎么能做到呢?

#input file name and path assumed in current directory
file="./osc.txt"
resultfilepath="./OSCoutput.txt"
tmpfilepath="./OSCtempoutput.txt"
tmp1filepath="./OSCtemp1output.txt"
tmp2filepath="./OSCtemp2output.txt"

rm $resultfilepath
rm $tmpfilepath
#using awk to filter only series data without dates
awk -F' ' '{print $1}' $file >> $tmpfilepath
#getting all the unique dataclass_names at column 1
DATACLASSNAME=(`cut -f 1 -d'.' $tmpfilepath | sort | uniq`)
for i in "${DATACLASSNAME[@]}"; do
rm $tmp1filepath
#we are filtering the file with that dataclass
awk -F'.' -v awk_dataclassname="$i" '$1==awk_dataclassname' $tmpfilepath >> $tmp1filepath
#also we are calculating the number of delimeter in filtered record and sorting it
COLCOUNT=(`awk -F'.' '{print NF}' $tmp1filepath | uniq | sort`)
for j in "${COLCOUNT[@]}"; do
rm $tmp2filepath
#in the filtered data we are taking series of a particular dimension length and dumping data
awk -F '.' -v awk_colcount="$j" '(NF==awk_colcount){print}' $tmp1filepath >> $tmp2filepath
#reducing column no by 1
newj=$(echo $((j - 1)))
#removing last column(generally observation dimension) by cut column
GREPSAMPLE=(`cut -f -$newj -d'.' $tmp2filepath | uniq`)
SAMPLELENGTH=(`wc -l $tmp2filepath`)
#we are now taking unique series sample
for k in "${GREPSAMPLE[@]}"; do
#doing grep of unique sample but taking the whole line
echo `grep $k $tmp1filepath | head -1` >> $resultfilepath
done
done
done
cat $resultfilepath
echo "processing finish"

整个事情都可以通过这个awk调用来完成。

awk '{
    key = $0;
    sub("\.[^.]*$", "", key);      # Let key be everything up to the last dot
    if (!seen[key]) { print $1 }    # If key has not been seen, print 1st col
    seen[key] = 1;                  # Mark the key as seen
}' "$file" > "$resultfilepath"

一般来说,当你有一个涉及大量尴尬和嘎嘎声的脚本时,你很可能只写一个 awk 脚本。

相关内容

  • 没有找到相关文章

最新更新