Urllib错误请求问题



我尝试了这里的每个'User-Agent',仍然得到urllib.error.HTTPError: HTTP Error 400: Bad Request。我也试过这个,但我得到了urllib.error.URLError: File Not Found。我不知道该怎么做,我现在的代码是;

from bs4 import BeautifulSoup
import urllib.request,json,ast
with open ("urller.json") as f:
    cc = json.load(f) #the file I get links, you can try this link instead of this
    #cc = ../games/index.php?g_id=23521&game=0RBITALIS 
for x in ast.literal_eval(cc): #cc is a str(list) so I have to convert
    if x.startswith("../"):
        r = urllib.request.Request("http://www.game-debate.com{}".format(x[2::]),headers={'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'})
        #x[2::] because I removed '../' parts from urlls
        rr = urllib.request.urlopen(r).read()
        soup = BeautifulSoup(rr)
        for y in soup.find_all("ul",attrs={'class':['devDefSysReqList']}):
            print (y.text)

编辑:如果你只尝试一个链接,它可能不会显示任何错误,因为我每次在第6个链接得到错误。

一个快速的修复方法是将空格替换为+:

url = "http://www.game-debate.com"
r = urllib.request.Request(url + x[2:] ,headers={'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'})

一个更好的选择可能是让urllib引用参数:

from bs4 import BeautifulSoup
import urllib.request,json,ast
from urllib.parse import quote, urljoin
with open ("urller.json") as f:
    cc = json.load(f) #the file I get links, you can try this link instead of this
    url = "http://www.game-debate.com"

    for x in ast.literal_eval(cc):  # cc is a str(list) so I have to convert
        if x.startswith("../"):
            r = urllib.request.Request(urljoin(url, quote(x.lstrip("."))), headers={
                'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'})
            rr = urllib.request.urlopen(r).read()
            soup = BeautifulSoup(rr)
            print(rr.decode("utf-8"))
            for y in soup.find_all("ul", attrs={'class':['devDefSysReqList']}):
                print (y.text)

url中的空格无效,需要按百分比编码为%20或替换为+

相关内容

  • 没有找到相关文章

最新更新