我正在使用Selenium来抓取由javascript动态生成的网页。 当我直接从cmd(python(终端调用时,它工作正常。但是当我在课堂上实现此功能时,效果不佳。
我的类实现是:
class web_scraper():
def __init__(self):
# start chrome driver
self.driver = webdriver.Chrome(executable_path="./config/chromedriver.exe")
# scrape web page from specified url
def scrape_page(self, url):
html = None
try:
# scrape page
self.driver.get(url)
# read html
html = self.driver.execute_script("return document.documentElement.innerHTML;")
except Exception as e:
print('[Error:] Scrapping failed.')
print(f'[Exception:] {e}')
return html
if __name__ == '__main__':
url = "https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage9"
scraper = web_scraper()
content = scraper.scrape_page(url)
我在终端使用的代码是:
driver = webdriver.Chrome(executable_path='E:/Projects/Python_Projects/WebScraping/config/chromedriver.exe')
driver.get("https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage30")
content = driver.execute_script("return document.documentElement.innerHTML;")
类实现的输出为:
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link type="text/css" rel="stylesheet" href="Wipp.css">
<title>WIPP</title>
<link rel="stylesheet" href="https://wipp.edmundsassoc.com/Wipp/wipp/gwt/standard/standard.css"><script src="https://wipp.edmundsassoc.com/Wipp/wipp/0D3421F8F9508D2F958C63CE2A48BAD8.cache.js"></script></head>
<body>
<script type="text/javascript" language="javascript" src="wipp/wipp.nocache.js"></script>
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position:absolute;width:0;height:0;border:0"></iframe>
</body>
而在python终端上的命令的情况下,输出很好。
这方面的任何帮助都是可观的。谢谢!
我正在使用Windows操作系统,Python版本是3.6。
在获取 url 后添加 time.sleep((
self.driver.get(url)
time.sleep(10)