我想通过selenium在python中获取tweets,但get属性对我不起作用。这是我的代码。你能帮我修一下吗?
driver = webdriver.Chrome()
driver.get("http://twitter.com/elonmusk")
time.sleep(3)
SCROLL_PAUSE_TIME = 4
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
tweet = driver.find_elements(By.XPATH,"//div[@id='id__z5kb0qs2bgp']").get_attribute("innerHTML").splitlines()
time.sleep(SCROLL_PAUSE_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
driver.quit()
这样的东西应该可以用来抓取推文文本:
tweets = driver.find_elements(By.XPATH, '//div[@data-testid="tweetText"]')
for i in tweets:
print(i.get_attribute('innerText'))
要提取推文,您需要诱导WebDriverWait forvisibility_of_all_elements_located((并使用列表理解,您可以使用以下定位策略之一:
-
使用CSS_SELECTOR:
driver.get('http://twitter.com/elonmusk') print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='tweetText'] span")))])
-
使用XPATH:
driver.get('http://twitter.com/elonmusk') print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@data-testid='tweetText']//span")))])
-
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC