我正在从议会下载大量pdf。我抓取了pdf地址,现在尝试下载它们。
为此,我在一个大学云上建立了一个debian实例。
这对他们中的大多数人来说都很好,但对4个议会来说,我下载了一个必须接受cookie的错误页面。结果是一个以pdf文件结尾的html页面,主要包含我是否接受cookie的问题。
这个错误不会发生在Ubuntu或Windows 10上。
这是debian上旋度的输出:
curl -Iv4 http://dokumentation.landtag-mv.de/parldok/dokument/44970/eu_ratspraesidentschaft.pdf
* Trying 52.57.90.21...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x558926ee0f90)
* Connected to dokumentation.landtag-mv.de (52.57.90.21) port 80 (#0)
> HEAD /parldok/dokument/44970/eu_ratspraesidentschaft.pdf HTTP/1.1
> Host: dokumentation.landtag-mv.de
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Cache-Control: private, no-cache
Cache-Control: private, no-cache
< Content-Length: 14447
Content-Length: 14447
< Content-Type: text/html; charset=utf-8
Content-Type: text/html; charset=utf-8
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Frame-Options: sameorigin
X-Frame-Options: sameorigin
< X-Powered-By: ASP.NET
X-Powered-By: ASP.NET
< Date: Thu, 14 Jan 2021 16:17:41 GMT
Date: Thu, 14 Jan 2021 16:17:41 GMT
与Ubuntu相比,在那里我可以获得pdf
$ curl -Iv4 http://dokumentation.landtag-mv.de/parldok/dokument/44970/eu_ratspraesidentschaft.pdf
* Trying 52.57.90.21:80...
* TCP_NODELAY set
* Connected to dokumentation.landtag-mv.de (52.57.90.21) port 80 (#0)
> HEAD /parldok/dokument/44970/eu_ratspraesidentschaft.pdf HTTP/1.1
> Host: dokumentation.landtag-mv.de
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Cache-Control: private, no-cache
Cache-Control: private, no-cache
< Content-Length: 120419
Content-Length: 120419
< Content-Type: application/pdf
Content-Type: application/pdf
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< X-Frame-Options: sameorigin
X-Frame-Options: sameorigin
< X-Powered-By: ASP.NET
X-Powered-By: ASP.NET
< Date: Thu, 14 Jan 2021 16:01:14 GMT
Date: Thu, 14 Jan 2021 16:01:14 GMT
如果有人能告诉我做错了什么,我会很高兴
使用--output
选项:
curl --output eu_ratspraesidentschaft.pdf http://dokumentation.landtag-mv.de/parldok/dokument/44970/eu_ratspraesidentschaft.pdf