我开始使用Crawler4j,并在BasicCrawler示例中玩了一段时间。我删除了BasicDrawler.visit()方法的所有输出。然后我添加了一些我已经有的url处理。当我现在启动程序时,它会突然打印出大量我并不真正需要的内部处理信息。参见以下示例
Auth cache not set in the context
Target auth state: UNCHALLENGED
Proxy auth state: UNCHALLENGED
Attempt 1 to execute request
Sending request: GET /section.aspx?cat=7 HTTP/1.1
"GET /section.aspx?cat=7 HTTP/1.1[r][n]"
>> "Accept-Encoding: gzip[r][n]"
>> "Host: www.dailytech.com[r][n]"
>> "Connection: Keep-Alive[r][n]"
>> "User-Agent: crawler4j (http://code.google.com/p/crawler4j/)[r][n]"
>> "Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f[r][n]"
>> "[r][n]"
>> GET /section.aspx?cat=7 HTTP/1.1
>> Accept-Encoding: gzip
>> Host: www.dailytech.com
>> Connection: Keep-Alive
>> User-Agent: crawler4j (http://code.google.com/p/crawler4j/)
>> Cookie: DTLASTVISITED=11/20/2013 6:16:52 AM; DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; MF2=vaxc1b832fex; dtusession=dcef3fc0-dc04-4f13-8028-186aea942c3f
<< "HTTP/1.1 200 OK[r][n]"
<< "Cache-Control: private[r][n]"
<< "Content-Type: text/html; charset=utf-8[r][n]"
<< "Content-Encoding: gzip[r][n]"
<< "Vary: Accept-Encoding[r][n]"
<< "Server: Microsoft-IIS/7.5[r][n]"
<< "X-AspNet-Version: 4.0.30319[r][n]"
<< "Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com; expires=Tue, 20-Nov-2018 11:16:54 GMT; path=/[r][n]"
<< "Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/[r][n]"
<< "X-UA-Compatible: IE=EmulateIE7[r][n]"
<< "Date: Wed, 20 Nov 2013 11:16:54 GMT[r][n]"
<< "Content-Length: 8235[r][n]"
<< "[r][n]"
Receiving response: HTTP/1.1 200 OK
<< HTTP/1.1 200 OK
<< Cache-Control: private
<< Content-Type: text/html; charset=utf-8
<< Content-Encoding: gzip
<< Vary: Accept-Encoding
<< Server: Microsoft-IIS/7.5
<< X-AspNet-Version: 4.0.30319
<< Set-Cookie: DTLASTVISITED=11/20/2013 6:16:54 AM; domain=dailytech.com;
expires=Tue,20-Nov-2018 11:16:54 GMT; path=/
<< Set-Cookie: DTLASTVISITEDSYS=11/20/2013 6:16:48 AM; domain=dailytech.com; path=/
<< X-UA-Compatible: IE=EmulateIE7
<< Date: Wed, 20 Nov 2013 11:16:54 GMT
<< Content-Length: 8235
Cookie accepted: "[version: 0][name: DTLASTVISITED][value: 11/20/2013 6:16:5
AM][domain:dailytech.com][path: /][expiry: Tue Nov 20 12:16:54 CET 2018]".
Cookie accepted: "[version: 0][name: DTLASTVISITEDSYS][value: 11/20/2013 6:16:48
AM][domain: dailytech.com][path: /][expiry: null]".
Connection can be kept alive indefinitely
<< "[0x1f]"
<< "[0x8b]"
<< "[0x8]"
<< "[0x0]"
<< "[0x0][0x0][0x0][0x0][0x4][0x0]"
<< "[0xed][0xbd][0x7]`[0x1c]I[0x96]%&/m[0xca]{J[0xf5]J[0xd7][0xe0]t[0xa1]
[0x8][0x80]`[0x13]$[0xd8][0x90]@[0x10][0xec][0xc1][0x88][0xcd][0xe6][0x92][0xec]
[0x1d]iG#)[0xab]*[0x81][0xca]eVe]f[0x16]@[0xcc][0xed][0x9d][0xbc][0xf7][0xde]{[0xef]
[0xbd][0xf7][0xde]{[0xef][0xbd][0xf7][0xba];[0x9d]N'[0xf7][0xdf][0xff]?fd[0x1]l[0xf6]
[0xce]J[0xda][0xc9][0x9e]![0x80][0xaa][0xc8][0x1f]?~|[0x1f]?"~[0xe3][0xe4]7N[0x1e]
[0xff][0xae]O[0xbf]<y[0xf3][0xfb][0xbc]<M[0xe7][0xed][0xa2]L_~[0xf5][0xe4][0xf9]
有没有办法禁用所有输出?或者有人知道是什么原因造成的?这可能是我应该作为一个问题发布到社区的一个bug吗?
感谢您抽出时间
我找到了问题的答案。我将方法名称从main(string[]args)更改为crawl()。然后crawler4j开始打印ort调试内容。当我更改logger4j.properties时,它们消失了。