这是一个函数,我尝试使用 BeautifulSoup python 库从 <li>
标签中获取文章并编码,替换("?","(。
def getDoxyDonkeyText(testUrl):
request = urllib.request.urlopen(testUrl)
soup = BeautifulSoup(request)
mydivs = soup.findAll("div", {"class":'post-body'})
posts =[]
for div in mydivs:
posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
return posts
______________________
articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts = []
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
_______________________
这是我得到的错误。
_________________________
TypeError Traceback (most recent call last)
<ipython-input-35-cafa01352f7e> in <module>()
1 doxyDonkeyPosts = []
2 for link in links:
----> 3 doxyDonkeyPosts+=getDoxyDonkeyText(link)
<ipython-input-34-d5693b21e538> in getDoxyDonkeyText(testUrl)
6 posts =[]
7 for div in mydivs:
----> 8 posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
9 return posts
<ipython-input-34-d5693b21e538> in <lambda>(p)
6 posts =[]
7 for div in mydivs:
----> 8 posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
9 return posts
TypeError: a bytes-like object is required, not 'str'
_____________
错误的原因和解决方法将不胜感激。提前谢谢。
str.encode()
将返回类似字节的对象,用str替换会给你一个错误。 您需要提供字节替换。 喜欢(b'"?", b" ")
这是简化版本。
import urllib
from bs4 import BeautifulSoup
def getDoxyDonkeyText(testUrl):
request = urllib.request.urlopen(testUrl)
soup = BeautifulSoup(request, 'html.parser')
mydivs = soup.findAll("div", {"class":'post-body'})
posts =[]
for div in mydivs:
for li in div.find_all("li"):
posts.append(
li.text.encode('ascii', errors='replace').replace(b"?", b" ")
)
# if you want string
posts.append(
li.text.encode('ascii', errors='replace').decode().replace("?", " ")
)
return posts
articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
print(doxyDonkeyPosts)