在python中使用selenium只获取特定的链接



我试图获取与苹果相关的所有新闻文章的链接,使用这个网页:https://finance.yahoo.com/quote/AAPL/news?p=AAPL。但是中间也有很多广告链接和其他指向网站其他页面的链接。我如何有选择地只获取链接到新闻文章?下面是我到目前为止写的代码:

driver = webdriver.Chrome(executable_path='C:\Users\Home\OneDrive\Desktop\AJ\chromedriver_win32\chromedriver.exe')
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
links=[]
for a in driver.find_elements_by_xpath('.//a'):
links.append(a.get_attribute('href'))
def get_info(url):
#send request   
response = requests.get(url)
#parse    
soup = BeautifulSoup(response.text)
#get information we need
news = soup.find('div', attrs={'class': 'caas-body'}).text
headline = soup.find('h1').text 
date = soup.find('time').text
return news, headline, date

谁能指导如何做到这一点或资源,可以帮助这一点?谢谢!

尝试这个xpath从该页获取所有新闻链接。

//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a
driver.implicitly_wait(10)
driver.maximize_window()
driver.get("https://finance.yahoo.com/quote/AAPL/news?p=AAPL")
time.sleep(10)
links = driver.find_elements_by_xpath("//li[contains(@class,'js-stream-content')]/div[@data-test-locator='mega']//h3/a")
for link in links:
print(link.get_attribute("href"))

最新更新