在没有src属性的情况下访问twitter iframe中的html元素



我一直在使用BeautifulSoup 4中的Python从unglobal网站中抓取数据。那边的一些公司,比如这家:https://www.unglobalcompact.org/what-is-gc/participants/2968-Orsted-A-S拥有twitter帐号。我想访问推特账户的名称。问题是它位于没有src属性的iframe内部。我知道iframe是由不同于网站其他部分的请求调用的,但我现在想知道是否可以在不显示src属性的情况下访问它?

您可以使用selenium来执行此操作。这是完整的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = "https://www.unglobalcompact.org/what-is-gc/participants/2968-Orsted-A-S "
driver = webdriver.Chrome()
driver.get(url)
iframe = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="twitter-widget-0"]')))
driver.switch_to.frame(iframe)
names = driver.find_elements_by_xpath('//*[@class="TweetAuthor-name Identity-name customisable-highlight"]')
names = [name.text for name in names]

try:
name = max(set(names), key=names.count)  #Finds the most frequently occurring name. This is because the same author has also retweeted tweets made by others. These retweets would contain the name of other people. The most frequently occurring name is the name of the author.
print(name)
except ValueError:
print("No Twitter Feed Found!")
driver.close()

输出:

Ørsted

最新更新