如何使用 Python 从 JavaScript 饼图中抓取节点文本



如何使用Python从javascript Piechart图中抓取节点?

https://www.dice.com/skills/javascript

饼图图表

提示:我希望从图中抓取的文本代表图形节点,而不是普通文本。

实际上页面是通过JavaScript呈现的,因此我们可以使用selenium或使用requestsbs4,因为所需的输出位于script标签中,可以使用regex捕获

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
'https://www.dice.com/skills/javascript')
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.findAll("div", {'class': 'node'}):
print(item.text)
driver.quit()

输出:

JavaScript
CSS
HTML5
jQuery
jQuery UI
AngularJS
Bootstrap
jQuery
jQuery UI
CSS
HTML5
AngularJS
Ajax
jQuery UI
jQuery
Aptana
Zend Studio
Ajax
CSS
AngularJS
jQuery
HTML5
HTML
Bootstrap
CSS
Node.js
React.js
MongoDB
AngularJS
Express.js
NoSQL

更新:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.dice.com/skills/javascript")
soup = BeautifulSoup(r.text, 'html.parser')
script = soup.findAll("script")[8].get_text("t", strip=True)
start = script.find("{")
end = script.find(";")
print(script[start:end])

输出:

{"name":"JavaScript","children":[{"name":"CSS","children":[{"name":"HTML5"},{"name":"jQuery"},{"name":"jQuery UI"},{"name":"AngularJS"},{"name":"Bootstrap"}]},{"name":"jQuery","children":[{"name":"jQuery UI"},{"name":"CSS"},{"name":"HTML5"},{"name":"AngularJS"},{"name":"Ajax"}]},{"name":"jQuery UI","children":[{"name":"jQuery"},{"name":"Aptana"},{"name":"Zend Studio"},{"name":"Ajax"},{"name":"CSS"}]},{"name":"AngularJS","children":[{"name":"jQuery"},{"name":"HTML5"},{"name":"HTML"},{"name":"Bootstrap"},{"name":"CSS"}]},{"name":"Node.js","children":[{"name":"React.js"},{"name":"MongoDB"},{"name":"AngularJS"},{"name":"Express.js"},{"name":"NoSQL"}]}]}

相关内容

  • 没有找到相关文章

最新更新