我想用Selenium刮取类的所有文本值"撕裂点";从该网站:https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/
因此,我使用以下功能:
dots_graph = driver.find_element_by_class_name("tore-dots")
dots_graph.text
结果是一个串联字符串,如:"612119891210117968576554353345443434333">
然而,这些数字代表不同的位置,并且最多有两位数字。如何用分隔符刮文本。例如,所有不同的元素都在列表中分离,而不是串接在字符串中?
您可以使用driver.execute_script
来获取文本值:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/Users/jamespetullo/Downloads/chromedriver')
d.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')
dot_vals = d.execute_script('return Array.from(document.querySelectorAll("g.tore-dots text")).map(x => x.innerHTML)')
输出:
['2', '1', '1', '1', '1', '2', '6', '4', '2', '3', '5', '5', '4', '3', '3', '3', '2', '2', '2', '3', '2', '2', '2', '2', '1', '1', '2', '1', '1', '1', '1', '1', '1', '1']
获取dots_graph
后,应使用dots_graph.find_elements_...
(单词elements
中有字符s
(搜索dots_graph
中的所有<text>
作为分离元素,然后使用for
-循环从每个<text>
中获取.text
dots_graph = driver.find_element_by_class_name("tore-dots")
all_items = dots_graph.find_elements_by_tag_name("text")
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]
或者你可以尝试在一个xpath
中获得tore-dots
和<text>
# doesn't work with `g` and `text` - maybe because it is inside `<SVG>`
#all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]
或与CSS
选择器相同
all_items = driver.find_elements_by_css_selector('.tore-dots text')
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]
BTW:.text
不像在beautifulsoup
中那样意味着<text>
编辑:
最小工作代码
from selenium import webdriver
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')
# close popup window with message
driver.find_element_by_xpath('//button[@aria-label="Einwilligen"]').click()
print('--- FIND ---')
dots_graph = driver.find_element_by_class_name("tore-dots")
all_items = dots_graph.find_elements_by_tag_name("text")
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (g, text) ---')
# doesn't work with `g` and `text` - maybe because it is inside `<SVG>`
all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (*, name) ---')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[local-name()="text"]')
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (*, local-name) ---')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- CSS ---')
all_items = driver.find_elements_by_css_selector('.tore-dots text')
dot_vals = [item.text for item in all_items]
print(dot_vals)