我正在尝试从此页面中抓取博彩公司的赔率:
https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1
所以我到目前为止写了以下代码
interwetten <- read_html("https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1")
bundesliga <- html_nodes(interwetten, xpath = '//*[@id="TBL_Content_1019"]')
bundesliga_teams <- html_nodes(bundesliga, "span")
现在我得到的输出是:
[1] <span id="ctl00_cphMain_UCOffer_LeagueList_rptLeague_ctl00_ucBettingContainer_lblClose" clas ...
[2] <span itemscope="itemscope" itemprop="location" itemtype="http://schema.org/Place"><meta ite ...
[3] <span itemprop="name">VfB Stuttgart</span>
[4] <span>X</span>
现在我想在每个<span itemprop="name"></span>
中提取团队名称,但我不知道如何提取它。我尝试使用节点或属性,但它不起作用。
您可以使 XPath 选择器更具体,然后使用 html_text
,例如
library(rvest)
interwetten <- 'https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1' %>%
read_html()
teams <- interwetten %>%
html_nodes(xpath = '//*[@id="TBL_Content_1019"]//span[@itemprop="name"]') %>%
html_text()
teams
#> [1] "VfB Stuttgart" "1. FC Cologne" "Mainz 05"
#> [4] "Hamburger SV" "Hertha BSC" "Schalke 04"
#> [7] "Hannover 96" "Frankfurt" "Hoffenheim"
#> [10] "Augsburg" "Bayern Munich" "Freiburg"
#> [13] "Dortmund" "RB Leipzig" "Leverkusen"
#> [16] "Wolfsburg" "Werder Bremen" "Monchengladbach"