r-在df上循环并检索链接到其他列中唯一值的唯一值



我在excel中对标签进行了子分类和分类,但我想使其可复制,所以我想将其转换为R代码。

我有一个包含631行的df,其中前15行看起来是这样的。

IV_label               Subcategory            Category                         
<chr>                  <chr>                  <chr>                            
1 light conditions       time of day            exogenous                        
2 vital status           victim characteristics human involvement 
3 road type              road type              exogenous                        
4 reserve density        workload               police discretion                
5 road type              road type              exogenous                        
6 surface type           road type              exogenous                        
7 surface characteristic road type              exogenous                        
8 light conditions       time of day            exogenous                        
9 light conditions       time of day            exogenous                        
10 weather                weather type           exogenous                        
11 weather                weather type           exogenous                        
12 weather                weather type           exogenous                        
13 day of the week        day of the week        exogenous                        
14 amount of lanes        road type              exogenous                        
15 amount of lanes        road type              exogenous 

我希望能够将以下内容添加到我的R代码中,而不必自己构建列表:

time of day                 <- list(light conditions, ...)
victim characteristics      <- list(vital status, ...)
road type                   <- list(road type, surface type, surface characteristics, amount of lanes, ...) (# notice road type is include only once!)
workload                    <- list(reserve density, ...)
weather type                <- list(weather, ...)
day of the week             <- list(day of the week, ...)
exogenous                   <- list(time of day, road type, weather type, day of the week)
human involvement           <- list(victim characteristics)
police discretion           <- list(workload)

我知道我需要自己对这一部分进行样板:

time of day                 <- list(
victim characteristics      <- list(
road type                   <- list(
workload                    <- list(
weather type                <- list(
day of the week             <- list(
exogenous                   <- list(
human involvement           <- list(
police discretion           <- list(

但我希望能够从控制台复制唯一的值,并将它们复制到上面的样板中。

这里我考虑的是出现在同一行、两个连续列中的任何一对项。我使用邻接矩阵adj来跟踪边,然后将图重建为命名列表:

library(purrr)
df <- data.frame(IV_label = c(
"light conditions","vital status","road type",
"reserve density","road type","surface type",
"surface characteristic","light conditions","light conditions",
"weather","weather","weather",
"day of the week","amount of lanes","amount of lanes"),
Subcategory = c(
"time of day","victim characteristics","road type",
"workload","road type","road type",
"road type","time of day","time of day",
"weather type","weather type","weather type",
"day of the week","road type","road type"),
Category = c(
"exogenous","human involvement","exogenous",
"police discretion","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous"))

names <- c("IV_label", "Subcategory", "Category") |>
purrr::map(~pull(df, .x)) |>
purrr::reduce(union)
## adjacency matrix
adj <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
adj[cbind(df[,2], df[,1])] <- 1
adj[cbind(df[,3], df[,2])] <- 1
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)

输出:

$`road type`
[1] "road type"              "surface type"           "surface characteristic"
[4] "amount of lanes"       
$`day of the week`
[1] "day of the week"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type"       "day of the week" "time of day"     "weather type"   
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"

您可能想要取消设置adj的对角线以避免自引用边:

adj[row(adj) == col(adj)] <- 0
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)

输出:

$`road type`
[1] "surface type"           "surface characteristic" "amount of lanes"       
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type"       "day of the week" "time of day"     "weather type"   
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"

最新更新