r语言 - 如何使用rvest刮下拉菜单的文本?



我想刮掉这个网页给出的几个下拉菜单的所有现有文本。结构很简单,我也复习了之前给出的答案,但我的结果是零。代码是:

library(rvest);library(tidyverse)
pg <- read_html("https://www.siviltoplum.gov.tr/illere-ve-faaliyet-alanlarina-gore-dernekler")
pg %>% 
html_nodes("option") %>% 
html_text()

如有任何帮助,不胜感激。

看起来你试图从错误的页面上抓取…检查源文件,发现一个来自其他位置的iframe

<iframe allowtransparency="true" scrolling="no" src="https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler" style="width: 940px; height: 916px; color: rgb(255, 255, 255); margin-top: 0px; margin-left: 5px; float: left; background-color: transparent; fontColor: #ffffff; fontSize: 32px; border-size: 0px;" frameborder="0"></iframe>

read.this <- "https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler"
library( rvest )
library( tidyverse )
pg <- read_html(read.this, encoding = "latin1")
pg %>% 
html_nodes("option") %>% 
html_text()

# [1] "ADANA"                       "ADIYAMAN"                    "AFYONKARAHÄ°SAR"             "AÄu009eRI"                 
# [5] "AKSARAY"                     "AMASYA"                      "ANKARA"                      "ANTALYA"                    
# [9] "ARDAHAN"                     "ARTVÄ°N"                     "AYDIN"                       "BALIKESÄ°R"                 
# [13] "BARTIN"                      "BATMAN"                      "BAYBURT"                     "BÄ°LECÄ°K"                  
# [17] "BÄ°NGÃu0096L"               "BÄ°TLÄ°S"                    "BOLU"                        "BURDUR"  
....

最新更新