import libraries

import urllib2
from bs4 import BeautifulSoup

新库：

import csv
import requests 
import string

定义变量：

i = 1
str_i = str(i)
seqPrefix = 'seq_'
seq_1 = str('https://anyaddress.com/')
quote_page = seqPrefix + str_i

#Then，使用 Python urllib2 来获取声明的 url 的 HTML 页面。

# query the website and return the html to the variable 'page'
page = urllib2.urlopen(quote_page)  

#Finally, parse the page into BeautifulSoup format so we can use BeautifulSoup to work on it.
# parse the html using beautiful soup and store in variable `soup`
soup = BeautifulSoup(page, 'html.parser')

结果，一切都很好...除了：

错误信息：

page = urllib2.urlopen(quote_page( 文件 "C：\Python27\lib\urllib2.py"，第 154 行，在 urlopen 中返回 opener.open(URL， data， timeout( 文件 "C：\Python27\lib\urllib2.py"，第 423 行，打开协议 = req.get_type(( 文件 "C：\Python27\lib\urllib2.py"，第 285 行，get_type 引发值错误，"未知网址类型： %s" % self.__original 值错误：未知的 URL 类型： seq_1

为什么？

TXS。

您可以使用局部变量字典 vars((

page = urllib2.urlopen(vars()[quote_page])

您拥有它的方式是尝试使用字符串"seq_1"作为 URL 而不是有效 URL 的 seq_1 变量的值打开 URL。

看起来你需要连接seq_1和str_i

前任：

seq_1 = str('https://anyaddress.com/')
quote_page = seq_1 + str_i

输出：

https://anyaddress.com/1

Python 2.7 : 未知的网址类型: urllib2 - BeautifulSoup.

import libraries

新库：

相关内容

最新更新

热门标签：