r-Rvest和谷歌新闻网络抓取:不起作用



我是网络抓取的新手,下面的代码产生了一个空字符向量,不知道如何解决:

google_url <- "https://news.google.com/topstories?hl=en-GB&gl=GB&ceid=GB:en"
google <- read_html(google_url)
articles <- google %>% html_nodes('.VDXfz') %>% html_text()
articles 

下面将从当前加载的页面中获取所有标题。如果需要滚动并进一步提取数据,则需要RSelenium

library(rvest)
url = 'https://news.google.com/topstories?hl=en-GB&gl=GB&ceid=GB:en'
url %>% read_html() %>% html_nodes('.lBwEZb') %>% 
html_nodes('.DY5T1d') %>% 
html_text()
[1] "Liz Truss to hold Brexit talks with EU over NI protocol"                                                                             
[2] "Lord Frost: I didn't support PM's coercive Covid plan"                                                                               
[3] "David Frost: I never disagreed with Boris Johnson over Brexit policy – only coercive Covid rules"                                    
[4] "Look at the lauding of David Frost and see a government deranged by the poison of Brexit"                                            
[5] "What happened to the amiable, hard-working David Frost I once knew?"                                                                 
[6] "COVID-19: Omicron now dominant variant in US after making up 73% of new cases, says CDC"   

相关内容

  • 没有找到相关文章

最新更新