如何使用Python中的Selenium在twitter中获取tweet



我想通过selenium在python中获取tweets,但get属性对我不起作用。这是我的代码。你能帮我修一下吗?

driver = webdriver.Chrome()
driver.get("http://twitter.com/elonmusk")
time.sleep(3)
SCROLL_PAUSE_TIME = 4
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
tweet = driver.find_elements(By.XPATH,"//div[@id='id__z5kb0qs2bgp']").get_attribute("innerHTML").splitlines()
time.sleep(SCROLL_PAUSE_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
driver.quit()

这样的东西应该可以用来抓取推文文本:

tweets = driver.find_elements(By.XPATH, '//div[@data-testid="tweetText"]')
for i in tweets:
print(i.get_attribute('innerText'))

要提取推文,您需要诱导WebDriverWait forvisibility_of_all_elements_located((并使用列表理解,您可以使用以下定位策略之一:

  • 使用CSS_SELECTOR

    driver.get('http://twitter.com/elonmusk')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='tweetText'] span")))])
    
  • 使用XPATH:

    driver.get('http://twitter.com/elonmusk')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-testid='tweetText']//span")))])
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

最新更新