机械化选择第一个表单返回"ImportError: No module named html5lib"



阅读本教程后,我想出了此代码,

import requests
   from bs4 import BeautifulSoup
   import re
   import mechanize
   import cookielib
   
   # Browser
   br = mechanize.Browser()
   
  # Cookie Jar
  cj = cookielib.LWPCookieJar()
  br.set_cookiejar(cj)
  
  # Browser options
  br.set_handle_equiv(True)
  br.set_handle_gzip(True)
  br.set_handle_redirect(True)
  br.set_handle_referer(True)
  br.set_handle_robots(False)
  
  # Follows refresh 0 but not hangs on refresh > 0
  br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
  
  # User-Agent (this is cheating, ok?)
  br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
  
  # The site we will navigate into, handling it's session
  br.open('http://www.cleanmetrics.net/foodcarbonscope')
  
  br.select_form(nr=0)
  br.form['ctl00$ContentPlaceHolder1$userName'] = "XXXXX"
  br.form['ctl00$ContentPlaceHolder1$passWord'] = "XXXXXX"
  
  # Login
  br.submit()

继续遇到此错误:

File "scrapeRecipe.py", line 30, in <module>
    br.select_form(nr=0)
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_mechanize.py", line 619, in select_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 260, in global_form
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 267, in forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 282, in _get_forms
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 247, in root
  File "build/bdist.macosx-10.11-intel/egg/mechanize/_html.py", line 145, in content_parser
ImportError: No module named html5lib

但是,我知道我已经成功安装了html5lib,因为当我运行pip3 freeze时,我会看到

html5lib==0.999999999
six==1.10.0
webencodings==0.5.1

最新:我认为我的问题可能与我的Easy-onstall.pth文件有关。在我的网站包装目录中,我实际上没有看到HTML5LIB。我只有这个:

BeautifulSoup-3.2.1-py2.7.egg
appdirs-1.4.3.dist-info
appdirs.py
appdirs.pyc
beautifulsoup4-4.5.3.dist-info
bs4
easy-install.pth
html2text-2016.9.19-py2.7.egg
mechanize-0.3.1-py2.7.egg
packaging
packaging-16.8.dist-info
pip-9.0.1-py2.7.egg
requests-2.13.0-py2.7.egg

当我运行easy_install html5lib时,我会得到Adding html5lib 0.999999999 to easy-install.pth file。但是,在成功完成HTML5LIB的处理依赖项之后,我打开了easy_install.pth文件,并且在任何地方都没有提到HTML5lib?

   import sys; sys.__plen = len(sys.path)
   ./BeautifulSoup-3.2.1-py2.7.egg
   ./html2text-2016.9.19-py2.7.egg
   ./mechanize-0.3.1-py2.7.egg
   ./requests-2.13.0-py2.7.egg
   ./pip-9.0.1-py2.7.egg
   import sys; new=sys.path[sys.__plen:]; del sys.path[sys.__plen:]; p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; sys.__egginsert = p+l    en(new)

除非html5lib在上述包装之一中?我想知道我是否需要在Python代码中导入HTML5LIB并列出root Path?

真的不知道为什么会被投资?:/

我现在遇到了一个不同的问题,但这是对html5lib的解决方案。

pip install --ignore-installed six --user
sudo -H pip install html5lib --ignore-installed

要了解更多信息,这是一个很棒的线程:https://github.com/pypa/pip/issues/3165

最新更新