AWK - 如何找到独特的 vaulues 并打印第一个和最后一个出现



我有一个大型文本文件,其中包含与一个项目相关的多行数据,一个项目最多可以有 15 行不同的行,但所有行都由一个名为"itemId"的字段链接,即 itemId=<12560317>每行的开头都有一个时间戳,即 170209 035711 0792。

170209 035711 0638 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, 
170209 035711 0638 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[3000]: Parked0>, Pendingstrategy=<notSpecified>, CscdestinationId=<-1: UnDef>, CmcdestinationId=<4099: All Scanners>, position=<sorter#0.scanner#4000: SCAN01>, itemRevisionNumber=<7> ##[
170209 035711 0715 DE(N) ItemHandler.ItemLog event=<SCANNER_RESULT>, ************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodeCount=<4>
170209 035711 0715 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600001372909310>,
170209 035711 0717 DE(N) ItemHandler.ItemLog event=<DISCHARGE_ATTEMPTED>, *******************, itemId=<12560209>, globalId=<12560209>, cmcIndex=<653>, sorter=<0: MS01>, state=<CSC: ProjectHeadingForChute>, CscdestinationId=<19: CHU208>, chuteGroup=<[17, 19, 21]: [CHU207, CHU208, CHU209]>, CmcdestinationId=<19: CHU208>, position=<sorter#0: MS01>, itemRevisionNumber=<16> ##[
170209 035711 0719 DE(N) ItemHandler.ItemLog event=<DESTINATION_REPLY>, *********************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, ccReason=<SCANNER_DATA_ADDED>, PendingccResult=<OK>, Pendingstrategy=<notSpecified>,
170209 035711 0719 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM>, *************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, PendingchuteGroup=<[3000]: Parked0>, Pendingstrategy=<notSpecified>, CscdestinationId=<-1: UnDef>, CmcdestinationId=<-1: UnDef>, position=<sorter#0.scanner#4001: IU04-SCAN02>, itemRevisionNumber=<4> ##[
170209 035711 0792 DE(N) ItemHandler.ItemLog event=<ITEM_AT_INDUCTION>, *********************, itemId=<12560317>, globalId=<12560317>, cmcIndex=<761>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, inductionId=<3: IU04>, position=<sorter#0.induction#3: IU04>, itemRevisionNumber=<0> ##[
170209 035711 0792 DE(N) ItemHandler.ItemLog event=<SET_ITEM_ID>, ***************************, itemId=<12560317>, globalId=<12560317>, cmcIndex=<761>, sorter=<0: MS01>, state=<CSC: ProjectIdle>, itemRevisionNumber=<0> ##[
170209 035711 0794 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4000: SCAN01>, chuteListStartPoint=<-1>, itemRevisionNumber=<9> ##[
170209 035711 0795 DE(N) ItemHandler.ItemLog event=<RECONVERT>, *****************************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForData>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>, position=<sorter#0.scanner#4000: SCAN01>, chuteListStartPoint=<-1>, itemRevisionNumber=<10> ##[
170209 035711 0795 DE(N) ItemHandler.ItemLog event=<DESTINATION_REQUEST>, *******************, itemId=<12560284>, globalId=<12560284>, cmcIndex=<728>, sorter=<0: MS01>, state=<CSC: WaitForData>, barcodes=<[ProxyWrapperBarcode(barcode=<JJD014600004019604475>, type=<C0>, result=<OK>, ccType=<>), 
170209 035711 0797 DE(N) ItemHandler.ItemLog event=<REDIRECT_ITEM_REPLY>, *******************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForDestination>, CscdestinationId=<3000: Parked0>, chuteGroup=<[3000]: Parked0>, CmcdestinationId=<3000: Parked0>,
170209 035711 0798 DE(N) ItemHandler.ItemLog event=<ITEM_INDUCTED>, *************************, itemId=<12560311>, globalId=<12560311>, cmcIndex=<755>, sorter=<0: MS01>, state=<CSC: WaitForData>, inductionId=<3: IU04>, inductionMode=<SCANNER>, inductStatus=<NORMAL_ITEM>, carrierId=<469>, carrierCount=<1>, CmcdestinationId=<3000: Parked0>, position=<sorter#0: MS01>, itemRevisionNumber=<7> ##[

目的:

我想做的是在Windows中使用gawk找到ITEMID的第一次出现并获取日期和时间以及最后一次出现的时间并获取数据和时间。 并将它们放在一行上,例如

ITEMID  170209 035711   170209 035932

有没有办法使用 GREP 或 AWK 或组合来做到这一点

谢谢

我会写:

awk '
!first[$8] {first[$8] = $0} 
{last[$8] = $0} 
END {for (id in first) {print first[id]; print last[id]}}
' file

您是否需要按日期或 id 或 ...?是否要一次只查找一个 ID?

单行代码是:

gawk '{ a = gensub(/([0-9]{6} [0-9]{6} [0-9]{4}).*itemId=<([0-9]+)>.*/, "\2 \1", "g", $0); b = split(a, c, " "); if (c[1] in result) result[c[1]] = gensub(/(.+),(.+)/, "\1," c[2] " " c[3] " " c[4], "g", result[c[1]]); else result[c[1]] = c[2] " " c[3] " " c[4] "," c[2] " " c[3] " " c[4]} END { for (i in result) print i ": " result[i]}' test.txt

让我评估一下:

  • var a 包含一行中的 itemId 和日期
  • 我们使用空格拆分 a,a[1] 包含 itemId,a[2],a[3],a[4] 日期部分
  • 如果 itemId 在数组 "result" 中
  • 尚不存在,我们将日期放在带有索引 itemId 的数组 "result" 中两次 (!(,
  • 如果 itemId 已存在,我们将第二个日期替换为新找到的日期。

这给我们留下了 assoc 数组,其中 itemId 作为键,值作为第一个和最后一个日期,用逗号分隔。

gawk '{ 
a = gensub(/([0-9]{6} [0-9]{6} [0-9]{4}).*itemId=<([0-9]+)>.*/, "\2 \1", "g", $0);
b = split(a, c, " "); 
if (c[1] in result) 
result[c[1]] = gensub(/(.+),(.+)/, "\1" "," c[2] " " c[3] " " c[4], "g", result[c[1]]);
else result[c[1]] = c[2] " " c[3] " " c[4] "," c[2] " " c[3] " " c[4]
} END { for (i in result) print i ": " result[i]}' test.txt

结果是:

12560311: 170209 035711 0715,170209 035711 0798
12560209: 170209 035711 0717,170209 035711 0717
12560284: 170209 035711 0638,170209 035711 0795
12560317: 170209 035711 0792,170209 035711 0792

编辑: 在窗口上运行它无法正常工作。将答案简化为:

awk "
!first[$8] {first[$8] = $1 FS $2 FS $3} 
{last[$8] = $1 FS $2 FS $3 } 
END {
for (id in first) {
print gensub(/itemId=<([^>]+)>,/, "\1", "g", id) FS first[id] FS last[id]}
}" Item.log

感谢@glennjackman灵感。请注意在窗口上运行它的引号转义。

相关内容

最新更新