用Selenium和不同文本部分之间的空白来擦除类的文本



我想用Selenium刮取类的所有文本值"撕裂点";从该网站:https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/

因此,我使用以下功能:

dots_graph = driver.find_element_by_class_name("tore-dots")
dots_graph.text

结果是一个串联字符串,如:"612119891210117968576554353345443434333">

然而,这些数字代表不同的位置,并且最多有两位数字。如何用分隔符刮文本。例如,所有不同的元素都在列表中分离,而不是串接在字符串中?

您可以使用driver.execute_script来获取文本值:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
d = webdriver.Chrome('/Users/jamespetullo/Downloads/chromedriver')
d.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')
dot_vals = d.execute_script('return Array.from(document.querySelectorAll("g.tore-dots text")).map(x => x.innerHTML)')

输出:

['2', '1', '1', '1', '1', '2', '6', '4', '2', '3', '5', '5', '4', '3', '3', '3', '2', '2', '2', '3', '2', '2', '2', '2', '1', '1', '2', '1', '1', '1', '1', '1', '1', '1']

获取dots_graph后,应使用dots_graph.find_elements_...(单词elements中有字符s(搜索dots_graph中的所有<text>作为分离元素,然后使用for-循环从每个<text>中获取.text

dots_graph = driver.find_element_by_class_name("tore-dots")
all_items = dots_graph.find_elements_by_tag_name("text")
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]

或者你可以尝试在一个xpath中获得tore-dots<text>

# doesn't work with `g` and `text` - maybe because it is inside `<SVG>` 
#all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]

或与CSS选择器相同

all_items = driver.find_elements_by_css_selector('.tore-dots text')
for item in all_items:
print(item.text)
dot_vals = [item.text for item in all_items]

BTW:.text不像在beautifulsoup中那样意味着<text>


编辑:

最小工作代码

from selenium import webdriver
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get('https://www.fussballdaten.de/vereine/fc-bayern-muenchen/2019/')
# close popup window with message
driver.find_element_by_xpath('//button[@aria-label="Einwilligen"]').click()
print('--- FIND ---')
dots_graph = driver.find_element_by_class_name("tore-dots")
all_items = dots_graph.find_elements_by_tag_name("text")
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (g, text) ---')
# doesn't work with `g` and `text` - maybe because it is inside `<SVG>` 
all_items = driver.find_elements_by_xpath('//g[@class="tore-dots"]//text')  
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (*, name) ---')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[local-name()="text"]')
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- XPATH (*, local-name) ---')
all_items = driver.find_elements_by_xpath('//*[@class="tore-dots"]//*[name()="text"]')
dot_vals = [item.text for item in all_items]
print(dot_vals)
print('--- CSS ---')
all_items = driver.find_elements_by_css_selector('.tore-dots text')
dot_vals = [item.text for item in all_items]
print(dot_vals)

最新更新