如何使用C套接字获取google.com网页



我编写的代码应该查询google.com网页并显示其内容,但它并没有按预期工作。

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
int main()
{
int sockfd;
struct sockaddr_in destAddr;
if((sockfd = socket(PF_INET, SOCK_STREAM, 0)) == -1){
fprintf(stderr, "Error opening client socketn");
close(sockfd);
return;
}
destAddr.sin_family = PF_INET;
destAddr.sin_port = htons(80);
destAddr.sin_addr.s_addr = inet_addr("64.233.164.94");
memset(&(destAddr.sin_zero), 0, 8);
if(connect(sockfd, (struct sockaddr *)&destAddr, sizeof(struct sockaddr)) == -1){
fprintf(stderr, "Error with client connecting to servern");
close(sockfd);
return;
}
char *httprequest1 = "GET / HTTP/1.1rn"
"Host: google.comrn"
"rn";
char *httprequest2 = "GET / HTTP/1.1rn"
"Host: http://www.google.com/rn"
"rn";
char *httprequest3 = "GET / HTTP/1.1rn"
"Host: http://www.google.com/rn"
"Upgrade-Insecure-Requests: 1rn"
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9rn"
"User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36rn"
"rn";
char *httprequest = httprequest2;

printf("start sendn");
int send_result = send(sockfd, httprequest, strlen(httprequest), 0);
printf("send_result: %dn", send_result);
#define bufsize 1000
char buf[bufsize + 1] = {0};
printf("start recvn");
int bytes_readed = recv(sockfd, buf, bufsize, 0);
printf("end recv: readed %d bytesn", bytes_readed);
buf[bufsize] = '';
printf("-- buf:n");
puts(buf);
printf("--n");

return 0;
}

如果我发送httprequest1,我会得到以下输出:

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 36
start recv
end recv: readed 528 bytes
-- buf:
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:52:16 GMT
Expires: Sun, 09 Oct 2022 11:52:16 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
--

httprequest2中,我指定了参数Host:,得到了以下输出:

gcc -w -o get-google get-google.c
./get-google
start send
send_result: 48
start recv
end recv: readed 198 bytes
-- buf:
HTTP/1.1 400 Bad Request
Content-Length: 54
Content-Type: text/html; charset=UTF-8
Date: Fri, 09 Sep 2022 11:53:19 GMT
Connection: close
<html><title>Error 400 (Bad Request)!!1</title></html>
--

然后我尝试从浏览器中复制标题,在httprequest3之后,我得到了和httprequest2相同的结果。

如何获取整页?

它应该是Host: www.google.com而不是Host: http://www.google.com/

然而,它可能不会给你主页。谷歌希望你使用HTTPS,所以它可能会将你重定向到https://www.google.com/,你将无法完全自己实现HTTPS(你必须使用像OpenSSL这样的库(

相关内容

最新更新