R中的NetworkD3 Sankey图:如何计算每个链接的值



我正试图以d3Network的R端口为例,创建一个详细说明的Sankey Plot(如下所述:https://christophergandrud.github.io/networkD3/)。我加载以下示例"能量"数据集:

# Load energy projection data
URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/",
"master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)

导入"能源"数据集会生成两个新的数据帧;节点和链接。查看链接数据可以发现以下格式:

head(Energy$links)
source target   value
1        0      1 124.729
2        1      2   0.597
3        1      3  26.862
4        1      4 280.322
5        1      5  81.144
6        6      2  35.000

"源"列表示源节点,"目标"列表示目标节点,而"值"列表示每个单独链接的值。

尽管这在概念上相当简单,但我在获得与Energy$linksdata.frame格式相同的数据集时遇到了巨大的困难。我已经能够获得以下格式的数据,但我对如何进一步转换它一无所知:

head(sampleSankeyData, n = 10L)
clientID                node1
<int>                <chr>
1     23969 1 Community Services
2     39199      1 Youth Justice
3     23595      1 Mental Health
4     15867 1 Community Services
5     18295            3 Housing
6     18295            2 Housing
7     18295 1 Community Services
8     18295            4 Housing
9     15253            1 Housing
10    27839 1 Community Services 

我想做的是为每个链接聚合唯一客户端的数量。例如,在上述数据子集中,由于客户18295,"1社区服务"到"2住房"的链接值应为1("2住房"到"3住房"以及"3住房"到"4住房"的链接值也应为1)。因此,我希望能够获得与Sankey图示例中的Energy$links相同格式的数据。

试试这个:

library(tidyverse)
library(stringr)
df <- tribble(
~number, ~clientID,         ~node1,
1 ,    23969, '1 Community Services',
2 ,    39199,      '1 Youth Justice',
3 ,    23595,      '1 Mental Health',
4 ,    15867, '1 Community Services',
5 ,    18295,            '3 Housing',
6 ,    18295,            '2 Housing',
7 ,    18295, '1 Community Services',
8 ,    18295,            '4 Housing',
9 ,    15253,            '1 Housing',
10,    27839, '1 Community Services')
df2 <- mutate(df, step=as.numeric(str_sub(node1, end=1))) %>%
spread(step, node1, sep='_') %>%
group_by(clientID) %>%
summarise(step1 = sort(unique(step_1))[1],
step2 = sort(unique(step_2))[1],
step3 = sort(unique(step_3))[1],
step4 = sort(unique(step_4))[1])
df3 <- bind_rows(select(df2,1,source=2,target=3),
select(df2,1,source=3,target=4),
select(df2,1,source=4,target=5)) %>%
group_by(source, target) %>%
summarise(clients=n())

并将其用于CCD_ 3。。。

links <- df3 %>% 
dplyr::ungroup() %>% # ungroup just to be safe
dplyr::filter(!is.na(source) & !is.na(target)) # remove lines without a link
# build the nodes data frame based on nodes in your links data frame
nodeFactors <- factor(sort(unique(c(links$source, links$target))))
nodes <- data.frame(name = nodeFactors)
# convert the source and target values to the index of the matching node in the 
# nodes data frame
links$source <- match(links$source, levels(nodeFactors)) - 1
links$target <- match(links$target, levels(nodeFactors)) - 1
# plot
library(networkD3)
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source', 
Target = 'target', Value = 'clients', NodeID = 'name')

相关内容

  • 没有找到相关文章

最新更新