无法使用硒的类实现来抓取网页

我正在使用Selenium来抓取由javascript动态生成的网页。当我直接从cmd(python(终端调用时，它工作正常。但是当我在课堂上实现此功能时，效果不佳。

我的类实现是：

class web_scraper():
def __init__(self):
# start chrome driver 
self.driver = webdriver.Chrome(executable_path="./config/chromedriver.exe")

# scrape web page from specified url
def scrape_page(self, url):
html = None
try:
# scrape page
self.driver.get(url)

# read html 
html = self.driver.execute_script("return document.documentElement.innerHTML;")
except Exception as e:
print('[Error:] Scrapping failed.')
print(f'[Exception:] {e}')

return html
if __name__ == '__main__':
url = "https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage9"
scraper = web_scraper()
content = scraper.scrape_page(url)

我在终端使用的代码是：

driver = webdriver.Chrome(executable_path='E:/Projects/Python_Projects/WebScraping/config/chromedriver.exe')
driver.get("https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage30")
content = driver.execute_script("return document.documentElement.innerHTML;")

类实现的输出为：

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link type="text/css" rel="stylesheet" href="Wipp.css">
<title>WIPP</title>
<link rel="stylesheet" href="https://wipp.edmundsassoc.com/Wipp/wipp/gwt/standard/standard.css"><script src="https://wipp.edmundsassoc.com/Wipp/wipp/0D3421F8F9508D2F958C63CE2A48BAD8.cache.js"></script></head>
<body>
<script type="text/javascript" language="javascript" src="wipp/wipp.nocache.js"></script>
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position:absolute;width:0;height:0;border:0"></iframe>

</body>

而在python终端上的命令的情况下，输出很好。

这方面的任何帮助都是可观的。谢谢！

我正在使用Windows操作系统，Python版本是3.6。

在获取 url 后添加 time.sleep((

self.driver.get(url)
time.sleep(10)

相关内容

最新更新

热门标签：