我有一个不断填充条目的文件,如下例所示。
我想要实现的是匹配一个字符串,从该行存储一些信息,从上面的某行获取更多细节,并将其全部添加到给定格式的新文件中。
逻辑可以这样解释:
- 点击"分派消息到ABCD"
- 存储日期&时间以dd.mm.yyyy hh:mi:ss格式表示,ID(总是第64个字符加上15个数字)和MSG文本(总是从第87个字符开始)
- 检查上面10行是否有相同ID的"Event to process"行,并存储visitedCID, visitedNID, vlr
- 将它们拼接成Timestamp;ID;NUM;CID;NID;VLR;MSG
这里有一些有用的规则:
- 上面的细节匹配字符串"分派消息到ABCD"并不总是给定的行数,但必须在上面的10行之内。这是由于"Inputfile"正在快速填充,因此为许多同时存在的用户(id)存储了不同的条目
- 在上面的10行中收集具有相同ID的行必须包含"Event to process"
其他静态数据:
- 时间戳的长度总是相同的
- ID总是15个数字
- NUM总是10个数字
- 已访问cid总是2或3个数字
- 已访问的nic总是2或3个数字
- vlr可以在5到15个数字之间
- MSG最多可以是250个字符
Inputfile Example:
Thu Jul 24|11:54:58.414|I|DataDispatcher0|Got Event : [ID=240012345678901, eventId = 240012345678901115458, num=4741234567, inbound=false, homeCID=240, homeNID=01, visitedCID=522, visitedNID=01, timestamp=Thu Jul 24 11:54:58 CEST 2014,hno=null,vlr=6012345678, msc=6012345678 eventtype=I, currentCID=null, currentNID=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wmsg2.DataDispatcher|processBlock|393
Thu Jul 24|11:55:06.035|I|DataDispatcher0|Got Event : [ID=240012345678901, eventId = 24001234567890111556, num=null, inbound=false, homeCID=242, homeNID=05, visitedCID=525, visitedNID=05, timestamp=Thu Jul 24 11:55:06 CEST 2014,hno=null,vlr=6012345678, msc=null eventtype=D, currentCID=null, currentNID=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wmsg2.DataDispatcher|processBlock|393
Thu Jul 24|11:55:06.035|W|39|Locking [240012345678901]. No of entries [0]|com.uws.wmsg2.Lock|LockID|58
Thu Jul 24|11:55:06.036|I|24|Event to process : [ID=240012345678901, eventId = 240012345678901115458, num=4741234567, inbound=false, homeCID=242, homeNID=05, visitedCID=525, visitedNID=05, timestamp=Thu Jul 24 11:54:58 CEST 2014,hno=null,vlr=6012345678, msc=6012345678 eventtype=I, currentCID=null, currentNID=null teleSvcInfo=null camelPhases=null serviceKey=null gprsenabled= false APNlist: null SGSN: null]|com.uws.wmsg2.EventProcessor|processEvent|139
Thu Jul 24|11:55:06.041|I|35| Event for dispatching messages 240012345678901115458|com.uws.wmsg2.MSGMessageDispatcher|dispatchMessage|55
Thu Jul 24|11:55:06.072|I|35|Dispatched message to ABCD : ID : 240012345678901, MSG : Welcome to oblivion. There will be no quarrel held. Please enjoy your stay. : ABCD3750251406.195706.62820|com.uws.wmsg2.MSGMessageDispatcher|dispatchMessage|108
Thu Jul 24|11:55:06.074|W|35|Unlocking [240012345678901]. No of entries [1]|com.uws.wmsg2.Lock|UnLockID|64
Desired Output:
24.07.2014 11:55:06;240012345678901;4741234567;525;05;6012345678;Welcome to oblivion. There will be no quarrel held. Please enjoy your stay.
我希望我能解释得足够好。有人能想出一个实现这个的方法吗?谢谢!
让我们稍微简化一下输入以隔离问题。给定如下输入文件:
$ cat file
a b c d
e f g h
i j k l
m n o p
q r s t
u v w x
y z A B
下面是如何保持最后5行读取的缓冲区,当看到字母"r"时,从它前面的第三行打印第二个字段:
$ awk '/r/{split(buf[(NR-3)%5],arr); print arr[2]} {buf[NR%5]=$0}' file
f
在您的情况下,您只需将5
更改为10
,将3
更改为您想要的任何前行,并使用您喜欢的任何索引访问arr[]
。
现在,你还需要做什么?
试试这个脚本;
#!/bin/bash
logfile=$1
newfile="dispatchclean.csv"
previous=1
current=0
if [[ -n $logfile ]]
then
while read line; do
((current++))
#get a line of interest
LastLineofInterest=`echo $line | grep "Dispatched message to ABCD"`
#if found than get other info
if [[ -n $LastLineofInterest ]]
then
#get the date and convert the format to dd.mm.yyyy
date=`echo $LastLineofInterest | awk -F'|' '{print $1}'`
datedmY=`date -d"$date" +%d.%m.%Y`
#get the time portion hh:mm:ss
time=`echo $LastLineofInterest | awk -F'|' '{$0=substr($2,1,8)}1'`
#get the string between ID and MSG
ID=`echo $LastLineofInterest | sed -E 's/(.*ID : )(.*)(, MSG.*)/2/'`
#get the string between MSG and :
MSG=`echo $LastLineofInterest | sed -E 's/(.*MSG : )(.*)( :.*)/2/'`
#look for the ID in the 10 lines before the actual message
#IDinLast10l=`grep -B 10 "Dispatched message to ABCD" $logfile | grep $ID`
IDinLast10l=`sed -n "${previous},${current}p" $logfile | grep $ID`
#get relevant info from above string
num=`echo $IDinLast10l | sed -E 's/(.*num=)(.*)(, inbound.*)/2/'`
CID=`echo $IDinLast10l | sed -E 's/(.*visitedCID=)(.*)(, visitedNID.*)/2/'`
NID=`echo $IDinLast10l | sed -E 's/(.*visitedNID=)(.*)(, timestamp.*)/2/'`
vlr=`echo $IDinLast10l | sed -E 's/(.*vlr=)(.*)(, msc.*)/2/'`
#append to the newfile
echo "$datedmY $time;$ID;$num;$CID;$NID;$vlr;$MSG" >> $newfile
previous=$current
fi
done < $logfile
else
echo "please supply the logfile as parameter"
fi