AWK从另一个文件中的列表中搜索一个文件并格式化输出

我想这个问题以前已经回答过了，但我还没能找到一个解决这个特定需求的问题。

在使用AWK搜索以太网数据包的150mb文本文件以查找csv(或文本(文件中列表中的字符串时，我遇到了困难。其中一个问题似乎是数据文件中的前导空格以及数据文件中冒号后面的信息。

我想在数据文件中搜索术语"；Epoch"；1美元。这将在每一帧中，然后在列表文件中搜索在下一次出现"0"之前找到的任何术语；Epoch"；在$1中，并且如果发现它们从"；Epoch"；行，然后是列表文件中用逗号(，(分隔的任何引用。

列表文件看起来像这个

bill_more_data:
and_more_data:
pay_hour:
age_years:
favorite_adventure_type:

数据文件如下所示。

No.     Time           Source                Destination
1 0.000000       xxx.xx.xxx.x           xxx.xx.xxx.x
Frame 1: 52 bytes on wire (100 bits), 22 bytes captured (757 bits)
Arrival Time: Jun 28, 2021 04:17:23.747890000 Pacific Daylight Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1624879043.747890000 seconds
[Time delta from previous captured frame: 0.000000000 seconds]
Frame Number: 1
Ethernet II, Src: xxxxxxxxxxxxxx (xxx:xxx:xx:xx:xx:xx:x), Dst: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
Destination: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
Address: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
.... .... .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
.... .... .... .... .... .... = IG bit: Individual address (unicast)
Internet Protocol Version 4, Src: xxx.xx.xxx.x, Dst: 000.00.00.00
Flags: 0x0, Don't fragment
0... .... = Reserved bit: Not set
Info Header
message_id: 000x00
message_length: 2
bill_some_data
bill_that_data: 0
bill_more_data: 1
and_more_data: 0
0000  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
No.     Time           Source                Destination        
2 0.000275       xxx.xx.xxx.x          xxx.xx.xxx.x          
Frame 2: 60 bytes on wire (454 bits), 55 bytes captured (454 bits)
Arrival Time: Jun 28, 2021 04:17:23.748165000 Pacific Daylight Time
[Time shift for this packet: 0.000000000 seconds]
Epoch Time: 1624879043.748165000 seconds
[Time delta from previous captured frame: 0.000275000 seconds]
Frame Number: 2
Ethernet II, Src: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x), Dst: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
Destination: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
Address: xxx:xxx:xx:xx:xx:xx:x (xxx:xxx:xx:xx:xx:xx:x)
.... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
.... ...0 .... .... .... .... = IG bit: Individual address (unicast)
Internet Protocol Version 4, Src: 172.28.1.72, Dst: 172.28.1.30
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Flags: 0x00
0... .... = Reserved bit: Not set
.0.. .... = Don't fragment: Not set
..0. .... = More fragments: Not set
Fragment Offset: 0
[Stream index: 1]
Info Header
message_id: 0x00000
message_length: 5
TED_name
pay_hour: 3.25
vacation_days: 0
age_years: 22
time_in_role: 0.1
favorite_adventure_type: excellent

我尝试了以下操作，但没有输出。如果我将列表文件更改为在冒号后包含信息，那么它至少会打印这些行，但这对我没有多大好处。

awk 'NR==FNR{a[$1]++;next}a[$1]' list.txt data.txt

所需输出

44375.4704137487,bill_more_data,1,bill_more_data,1
1624879043.748165000,pay_hour,3.25,age_years,22,favorite_adventure_type,excellent

或更好地使用格式化为在Excel 中工作的历元时间

44375.4704137487,bill_more_data,1,bill_more_data,1
44375.4704137518,pay_hour,3.25,age_years,22,favorite_adventure_type,excellent

我尝试过其他几种方法，但收效甚微。我正在寻找一种优雅的方法来做到这一点，它还能够以记录这些值的方式处理未来对格式的更改。我尝试的另一种方法是运行几个AWK命令，首先反转文件，然后搜索列表项并打印它们，然后搜索下一个出现的项或"；Epoch"；并打印$3，然后再次反转文件，然后另一个AWK打印任何不是"；Epoch"；。如果我直接在命令中输入列表项，这似乎可以正常工作，但如果我试图从另一个文件中读取，则无法使其工作。

提前感谢您在这方面的帮助。这是我第一次使用AWK，所以如果我问了一个愚蠢的问题，请原谅我。

$ cat tst.awk
BEGIN { OFS="," }
NR==FNR {
tags[$0]
next
}
$1 == "Epoch" {
prt()
time = $3
}
$1 in tags {
tag2val[$1] = $2
}
END {
prt()
}
function prt(   out) {
if ( time != "" ) {
out = sprintf("%0.10f",(time / 86400) + 25569)
for (tag in tag2val) {
val = tag2val[tag]
sub(/:/,"",tag)
out = out OFS tag OFS val
}
print out
}
delete tag2val
}

$ awk -f tst.awk list file
44375.4704137487,and_more_data,0,bill_more_data,1
44375.4704137519,favorite_adventure_type,excellent,age_years,22,pay_hour,3.25

awk 'NR==FNR{a[$1]++;next}a[$1]' list.txt data.txt

这部分是正确的：

NR==FNR{a[$1]++;next}

这意味着在CCD_ 3的末尾，数组CCD_。但对于其余部分，您必须循环遍历该数组，以查看匹配的内容。类似于：

/Epoch:/ {
if( epoch ) { printf "n" }
epoch = 1
printf "%s" $2
}
epoch != 1 { next }
for( term in a ) {
if( $0 ~ term ) { printf ... }
}
END { if( epoch ) { print "n" } }

它在每个历元之后打印一条新行：在除第一个历元之外的每个历元之前，如果看到任何历元，则在末尾。

很抱歉回复太晚@艾德·莫顿，谢谢你在这方面的帮助。剧本写得很好！

它只需要稍微修改一下，就可以用if语句过滤掉空白的时间戳行，并为回车问题添加{sub(/\r$/，"(}。

{ sub(/r$/,"") }
BEGIN { OFS=","}
NR==FNR {
tags[$0]
next
}
$1 == "Epoch" {
prt()
time = $3
}
$1 in tags {
tag2val[$1] = $2
}
END {
prt()
}
function prt(   out) {
if ( time != "" ) {
out = sprintf("%0.10f",(time / 86400) + 25569)
for (tag in tag2val) {
val = tag2val[tag]
sub(/:/,"",tag)
out = out OFS tag OFS val
i = i + 1
}
if  (i !=x )
{
print out
}
x = i
}
delete tag2val
}

相关内容

最新更新

热门标签：