如何在R中排列嵌套数据(即具有父级关系的数据)



我有一个具有多个级别的数据集:

  1. 类别(例如"国家"(
  2. 国家(例如"USA"(
  3. 城市(例如"纽约"(
  4. 县(例如"曼哈顿"(
  5. 地点(例如"时代广场"(

每一行(LVL 1条目除外(都链接到上级的父级。

例如:时代广场->曼哈顿->纽约->美国->国家

我的问题:如何对数据集进行排序:

df2 <- structure(list(ID = c(3,6,9,11,12,19,411,50,77,83,105),
Parent = c(12,12,77,105,19,NA,3,41,19,77,19),
Level = c(3,3,3,3,2,1,4,5,2,3,2),
Name = c("New York","Boston","Oxford","Vancouver","USA","Countries",
"Manhattan","Times Square","UK","London","Canada")),
class = "data.frame",
row.names = c(NA, -11L))

进入这个:

df2 <- structure(list(ID = c(19,12,3,41,50,6,77,83,9,105,11),
Parent = c(NA,19,12,3,41,12,19,77,77,19,105),
Level = c(1,2,3,4,5,3,2,3,3,2,3),
Name = c("Countries","USA","New York","Manhattan","Times Square",
"Boston","UK","London","Oxford","Canada","Vancouver")),
class = "data.frame",
row.names = c(NA, -11L))

df2中,列表首先根据级别排列,但每个链接的子级别都直接在下面。

我尝试过几种dyplr::arrange()变体(例如arrange(Level, Parent)(,但都无法解释嵌套的数据。我认为解决方案可能是group_by((和使用arrange(,.by_group=TRUE(的组合,就像这里所做的那样(R,dplyr-group_by和arrange的组合不会产生预期的结果?(。不幸的是,我自己解决不了。

有人能帮忙吗?优选tidyverse/dplyr溶液:-(

以下是使用igraph::dfs的解决方案

library(igraph)
g <- with(na.omit(df2), graph.data.frame(cbind(Parent, ID), directed = TRUE))

data.frame(ID = as.integer(names(dfs(g, root = "19")$order))) |>
left_join(df2)

##> + Joining, by = "ID"
##>     ID Parent Level         Name
##> 1   19     NA     1    Countries
##> 2   12     19     2          USA
##> 3    3     12     3     New York
##> 4   41      3     4    Manhattan
##> 5   50     41     5 Times Square
##> 6    6     12     3       Boston
##> 7   77     19     2           UK
##> 8    9     77     3       Oxford
##> 9   83     77     3       London
##> 10 105     19     2       Canada
##> 11  11    105     3    Vancouver

最新更新