r中具有相同URL的多个页面中的Web剪贴表

我想从网站上抓取一只股票的收益表("RENDIMENTOS"(

https://statusinvest.com.br/fundos-imobiliarios/rbrp11

该表有多个页面，但URL不会更改。

使用selectorGadget扩展，它告诉我这个节点的名称是"；tbody"；，但在阅读这个节点时，我只能看到前十二条记录(第一页(。有可能从所有页面上抓取记录吗？

我正在尝试这个代码：

library("rvest")
url <- "https://statusinvest.com.br/fundos-imobiliarios/rbrp11"
url %>% read_html %>% 
html_nodes("tbody") %>% 
.[1] %>%
html_table(fill=TRUE)

打开网站的源代码，我可以看到1377行的所有记录，格式如下：

<输入id＝"；结果"；name＝"；结果"；type＝"；隐藏的"；值＝"；[{"y"：0，"m"：0；"d"：0、"ad"：null，"ed"："05/08/2022"，"pd"："12/08/2022"，"et"："Renimplemento"，"etd"："renimplemento"：4500000000000000000，"ov"：null，"sv"："045000000"，"sov"："-"，"adj"：false}，(……(

感谢

它看起来表中的值以JSON格式存储在"输入"；节点。

因此，这只是一个定位正确节点、提取属性并从JSON转换的问题。

library("rvest")
#read the page
url <- "https://statusinvest.com.br/fundos-imobiliarios/rbrp11"
page<- read_html(url)
#get the parent 'div' node
node <-page %>% 
html_elements(xpath= ".//div[contains(@class, 'card chart-and-list scroll-y no-scroll-md-y rounded pt-md-3 pb-3 show-empty-callback')]") 
#get the value attribute of the input and convert from JSON
answer <- node %>% html_element("input") %>% 
html_attr("value")  %>% 
jsonlite::fromJSON()
y m d ad         ed         pd         et        etd         v ov         sv sov   adj
1  0 0 0 NA 05/08/2022 12/08/2022 Rendimento Rendimento 0.4500000 NA 0,45000000   - FALSE
2  0 0 0 NA 07/07/2022 14/07/2022 Rendimento Rendimento 0.4500000 NA 0,45000000   - FALSE
3  0 0 0 NA 07/06/2022 14/06/2022 Rendimento Rendimento 0.4500000 NA 0,45000000   - FALSE
4  0 0 0 NA 06/05/2022 13/05/2022 Rendimento Rendimento 0.4500000 NA 0,45000000   - FALSE
5  0 0 0 NA 07/04/2022 14/04/2022 Rendimento Rendimento 0.5000000 NA 0,50000000   - FALSE
6  0 0 0 NA 08/03/2022 15/03/2022 Rendimento Rendimento 0.4200000 NA 0,42000000   - FALSE
...

我不确定你在找什么信息，应该在这里吗？

相关内容

最新更新

热门标签：