Python 类型错误:编码和替换时?与"space"



这是一个函数,我尝试使用 BeautifulSoup python 库从 <li> 标签中获取文章并编码,替换("?","(。

def getDoxyDonkeyText(testUrl):
 request = urllib.request.urlopen(testUrl)
 soup = BeautifulSoup(request)
 mydivs = soup.findAll("div", {"class":'post-body'})
 posts =[]
 for div in mydivs:
     posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
 return posts
______________________
articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts = []
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
_______________________

这是我得到的错误。

_________________________
TypeError                                 Traceback (most recent call last)
<ipython-input-35-cafa01352f7e> in <module>()
      1 doxyDonkeyPosts = []
      2 for link in links:
----> 3     doxyDonkeyPosts+=getDoxyDonkeyText(link)
<ipython-input-34-d5693b21e538> in getDoxyDonkeyText(testUrl)
      6     posts =[]
      7     for div in mydivs:
----> 8         posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
      9     return posts
<ipython-input-34-d5693b21e538> in <lambda>(p)
      6     posts =[]
      7     for div in mydivs:
----> 8         posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
      9     return posts
TypeError: a bytes-like object is required, not 'str'
_____________

错误的原因和解决方法将不胜感激。提前谢谢。

str.encode()将返回类似字节的对象,用str替换会给你一个错误。 您需要提供字节替换。 喜欢(b'"?", b" ")

这是简化版本。

import urllib
from bs4 import BeautifulSoup
def getDoxyDonkeyText(testUrl):
    request = urllib.request.urlopen(testUrl)
    soup = BeautifulSoup(request, 'html.parser')
    mydivs = soup.findAll("div", {"class":'post-body'})
    posts =[]
    for div in mydivs:
        for li in div.find_all("li"):
            posts.append(
                li.text.encode('ascii', errors='replace').replace(b"?", b" ")
            )
            # if you want string
            posts.append(
                li.text.encode('ascii', errors='replace').decode().replace("?", " ")
            )
    return posts

articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
print(doxyDonkeyPosts)

最新更新