我在excel中对标签进行了子分类和分类,但我想使其可复制,所以我想将其转换为R代码。
我有一个包含631行的df,其中前15行看起来是这样的。
IV_label Subcategory Category
<chr> <chr> <chr>
1 light conditions time of day exogenous
2 vital status victim characteristics human involvement
3 road type road type exogenous
4 reserve density workload police discretion
5 road type road type exogenous
6 surface type road type exogenous
7 surface characteristic road type exogenous
8 light conditions time of day exogenous
9 light conditions time of day exogenous
10 weather weather type exogenous
11 weather weather type exogenous
12 weather weather type exogenous
13 day of the week day of the week exogenous
14 amount of lanes road type exogenous
15 amount of lanes road type exogenous
我希望能够将以下内容添加到我的R代码中,而不必自己构建列表:
time of day <- list(light conditions, ...)
victim characteristics <- list(vital status, ...)
road type <- list(road type, surface type, surface characteristics, amount of lanes, ...) (# notice road type is include only once!)
workload <- list(reserve density, ...)
weather type <- list(weather, ...)
day of the week <- list(day of the week, ...)
exogenous <- list(time of day, road type, weather type, day of the week)
human involvement <- list(victim characteristics)
police discretion <- list(workload)
我知道我需要自己对这一部分进行样板:
time of day <- list(
victim characteristics <- list(
road type <- list(
workload <- list(
weather type <- list(
day of the week <- list(
exogenous <- list(
human involvement <- list(
police discretion <- list(
但我希望能够从控制台复制唯一的值,并将它们复制到上面的样板中。
这里我考虑的是边出现在同一行、两个连续列中的任何一对项。我使用邻接矩阵adj
来跟踪边,然后将图重建为命名列表:
library(purrr)
df <- data.frame(IV_label = c(
"light conditions","vital status","road type",
"reserve density","road type","surface type",
"surface characteristic","light conditions","light conditions",
"weather","weather","weather",
"day of the week","amount of lanes","amount of lanes"),
Subcategory = c(
"time of day","victim characteristics","road type",
"workload","road type","road type",
"road type","time of day","time of day",
"weather type","weather type","weather type",
"day of the week","road type","road type"),
Category = c(
"exogenous","human involvement","exogenous",
"police discretion","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous",
"exogenous","exogenous","exogenous"))
names <- c("IV_label", "Subcategory", "Category") |>
purrr::map(~pull(df, .x)) |>
purrr::reduce(union)
## adjacency matrix
adj <- matrix(0,
nrow = length(names),
ncol = length(names),
dimnames = list(names, names))
adj[cbind(df[,2], df[,1])] <- 1
adj[cbind(df[,3], df[,2])] <- 1
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "road type" "surface type" "surface characteristic"
[4] "amount of lanes"
$`day of the week`
[1] "day of the week"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"
您可能想要取消设置adj
的对角线以避免自引用边:
adj[row(adj) == col(adj)] <- 0
setNames(asplit(adj, 1),names) |>
purrr::map(~names[which(.x == 1)]) |>
purrr::keep(~length(.x) > 0)
输出:
$`road type`
[1] "surface type" "surface characteristic" "amount of lanes"
$`time of day`
[1] "light conditions"
$`victim characteristics`
[1] "vital status"
$workload
[1] "reserve density"
$`weather type`
[1] "weather"
$exogenous
[1] "road type" "day of the week" "time of day" "weather type"
$`human involvement`
[1] "victim characteristics"
$`police discretion`
[1] "workload"