我想刮掉这个网页给出的几个下拉菜单的所有现有文本。结构很简单,我也复习了之前给出的答案,但我的结果是零。代码是:
library(rvest);library(tidyverse)
pg <- read_html("https://www.siviltoplum.gov.tr/illere-ve-faaliyet-alanlarina-gore-dernekler")
pg %>%
html_nodes("option") %>%
html_text()
如有任何帮助,不胜感激。
看起来你试图从错误的页面上抓取…检查源文件,发现一个来自其他位置的iframe
<iframe allowtransparency="true" scrolling="no" src="https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler" style="width: 940px; height: 916px; color: rgb(255, 255, 255); margin-top: 0px; margin-left: 5px; float: left; background-color: transparent; fontColor: #ffffff; fontSize: 32px; border-size: 0px;" frameborder="0"></iframe>
read.this <- "https://derbis.dernekler.gov.tr/IstatistikDerneklerWeb/IlFaaliyetAlaniDernekler"
library( rvest )
library( tidyverse )
pg <- read_html(read.this, encoding = "latin1")
pg %>%
html_nodes("option") %>%
html_text()
# [1] "ADANA" "ADIYAMAN" "AFYONKARAHÄ°SAR" "AÄu009eRI"
# [5] "AKSARAY" "AMASYA" "ANKARA" "ANTALYA"
# [9] "ARDAHAN" "ARTVÄ°N" "AYDIN" "BALIKESÄ°R"
# [13] "BARTIN" "BATMAN" "BAYBURT" "BÄ°LECÄ°K"
# [17] "BÄ°NGÃu0096L" "BÄ°TLÄ°S" "BOLU" "BURDUR"
....