如何使用Python从javascript Piechart图中抓取节点?
https://www.dice.com/skills/javascript
饼图图表
提示:我希望从图中抓取的文本代表图形节点,而不是普通文本。
实际上页面是通过JavaScript
呈现的,因此我们可以使用selenium
或使用requests
和bs4
,因为所需的输出位于script
标签中,可以使用regex
捕获
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
'https://www.dice.com/skills/javascript')
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.findAll("div", {'class': 'node'}):
print(item.text)
driver.quit()
输出:
JavaScript
CSS
HTML5
jQuery
jQuery UI
AngularJS
Bootstrap
jQuery
jQuery UI
CSS
HTML5
AngularJS
Ajax
jQuery UI
jQuery
Aptana
Zend Studio
Ajax
CSS
AngularJS
jQuery
HTML5
HTML
Bootstrap
CSS
Node.js
React.js
MongoDB
AngularJS
Express.js
NoSQL
更新:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.dice.com/skills/javascript")
soup = BeautifulSoup(r.text, 'html.parser')
script = soup.findAll("script")[8].get_text("t", strip=True)
start = script.find("{")
end = script.find(";")
print(script[start:end])
输出:
{"name":"JavaScript","children":[{"name":"CSS","children":[{"name":"HTML5"},{"name":"jQuery"},{"name":"jQuery UI"},{"name":"AngularJS"},{"name":"Bootstrap"}]},{"name":"jQuery","children":[{"name":"jQuery UI"},{"name":"CSS"},{"name":"HTML5"},{"name":"AngularJS"},{"name":"Ajax"}]},{"name":"jQuery UI","children":[{"name":"jQuery"},{"name":"Aptana"},{"name":"Zend Studio"},{"name":"Ajax"},{"name":"CSS"}]},{"name":"AngularJS","children":[{"name":"jQuery"},{"name":"HTML5"},{"name":"HTML"},{"name":"Bootstrap"},{"name":"CSS"}]},{"name":"Node.js","children":[{"name":"React.js"},{"name":"MongoDB"},{"name":"AngularJS"},{"name":"Express.js"},{"name":"NoSQL"}]}]}