我正在解析一个文件,其中包含带有nginx的字符串获取请求实体。有时它包含同一请求的两个部分之间的线路断裂,因此我无法用awk
解析此类请求。
我有两个带有awk -F'delimeter1: |delimiter2'
的定系数,也许我可以以某种方式告诉Awk,这些定界数之间可能会有一条折断,因此它将处理这样的两行?
预先感谢。
样本输入(Java错误是随机示例):
[2017-12-04 20:53:07] [ERROR] [ID-XX] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&c33=427&d28=
Like
&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
... 12 more
[2017-12-04 21:03:07] [ERROR] [ID-YY] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&em=Exception: Error: [$sc:ind] Aborting!&ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16&time=04/Dec/2017:21:03:07 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
... 12 more
[2017-12-04 19:40:02] [ERROR] [ID-ZZ] Get: el=search&dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)&id=104777577&a=770227875&t=pageview&ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36&time=04/Dec/2017:19:39:04 +0200&qtype=get
com.test.app. java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:75)
at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:93)
at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359)
at sun.font.FontDesignMetrics.<init>(FontDesignMetrics.java:350)
at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302)
at sun.swing.SwingUtilities2.getFontMetrics(SwingUtilities2.java:1113)
at javax.swing.JComponent.getFontMetrics(JComponent.java:1626)
at javax.swing.text.WrappedPlainView.updateMetrics(WrappedPlainView.java:318)
at javax.swing.text.WrappedPlainView.updateChildren(WrappedPlainView.java:297)
at javax.swing.text.WrappedPlainView.insertUpdate(WrappedPlainView.java:463)
at javax.swing.plaf.basic.BasicTextUI$RootView.insertUpdate(BasicTextUI.java:1610)
at javax.swing.plaf.basic.BasicTextUI$UpdateHandler.insertUpdate(BasicTextUI.java:1869)
at javax.swing.text.AbstractDocument.fireInsertUpdate(AbstractDocument.java:201)
at javax.swing.text.AbstractDocument.handleInsertString(AbstractDocument.java:748)
at javax.swing.text.AbstractDocument.insertString(AbstractDocument.java:707)
at javax.swing.text.PlainDocument.insertString(PlainDocument.java:130)
at javax.swing.text.DefaultEditorKit.read(DefaultEditorKit.java:273)
at javax.swing.JEditorPane.setText(JEditorPane.java:1416)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:1967) ~[?:1.8.0_151]
... 12 more
所需的输出(打印ID和车身(在"中)一行,然后用_&_
替换&
):
ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like_&_je=0_&_s4d=4-b_&_c32=(not set)_&_ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:20:52:02 +0200_&_qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://example.market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"
没有太多这些撕裂的请求身体字符串,其中大多数是一行,正如预期的那样。另外,只有错误的请求,因此搜索模式不必包括Get
(不需要)。
Awk
解决方案:
awk 'f{ if (/^[/) { printf " 42n"; f=0 } else printf("%s", $0) }
/ Get:/{
f=1; gsub(/[[]]/, "", $4); id=$4; sub(/^.* Get: /, "");
gsub("&", "_&_"); printf "%s 42%s",id,$0
}
END{ if (f) printf " 42n" }' file
-
/ Get:/
-遇到"获取请求" line-
f=1
-f
是指示下属/内部处理的标记 -
id=$4
-捕获 id field(forex。ID-XX
)
-
输出:
ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"
我知道您想与错误保持一行并格式化。
我们不知道什么是定界符。
这很奇怪,您将距离保持在线的末端。
您可以尝试此SED
sed '
/.*ERROR] [/!d # get the line with ERROR
s/// # delete all from start to ID
:A
/=get$/!{N;bA} # if the line not end with =get; get one more
s/([^]]*)[^:]*: (.*)/1 "2"/ # remove Get: and add "
s/n//g # remove n
s/&/_&_/g # replace & by _&_
' infile