分离R中的嵌入/嵌套数据帧



我有一个嵌入其中的数据帧,另一个数据帧

class(data)
[1] "dfidx_mlogit" "dfidx"        "data.frame"   "mlogit.data"

我正试图把这两个数据帧分开。一个包含有关健康和教育的相关数据,另一个包含个人id的信息,称为"idx"。

如何将两个数据帧完全分离?

以下是数据

data <- structure(list(EDUC = c(4L, 4L, 4L, 4L), HEALTH = c(3L, 3L, 3L, 
3L), idx = structure(list(chid = c(1L, 1L, 1L, 1L), unique_id = c(3000175513, 
3000175513, 3000175513, 3000175513), alt = structure(1:4, .Label = c("Bicycle", 
"Car", "Metro", "Walking"), class = "factor")), ids = c(1, 1, 
2), row.names = c(NA, 4L), class = c("idx", "data.frame"))), row.names = c(NA, 
4L), class = c("dfidx_mlogit", "dfidx", "data.frame", "mlogit.data"
), idx = structure(list(chid = c(1L, 1L, 1L, 1L), unique_id = c(3000175513, 
3000175513, 3000175513, 3000175513), alt = structure(1:4, .Label = c("Bicycle", 
"Car", "Metro", "Walking"), class = "factor")), ids = c(1, 1, 
2), row.names = c(NA, 4L), class = c("idx", "data.frame")))

如果我们想分离数据集,它是具有嵌套的"data.frame"的"id"列。我们可以pull该列来创建新的对象

library(dplyr)
data2 <- data %>% 
pull(idx) 
data1 <- data %>% 
as_tibble %>% 
select(-idx)
attr(data1, "idx") <- NULL

-检查结构

str(data1)
#tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
# $ EDUC  : int [1:4] 4 4 4 4
# $ HEALTH: int [1:4] 3 3 3 3
str(data2)
#Classes ‘idx’ and 'data.frame':    4 obs. of  3 variables:
# $ chid     : int  1 1 1 1
# $ unique_id: num  3e+09 3e+09 3e+09 3e+09
# $ alt      : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2

或在base R中执行此操作

data2 <- data$idx
class(data2) <- 'data.frame'
data1 <- data[1:2]

-检查结构

str(data1)
#Classes ‘dfidx_mlogit’, ‘dfidx’, ‘mlogit.data’ and 'data.frame':   4 obs. of  2 variables:
# $ EDUC  : int  4 4 4 4
# $ HEALTH: int  3 3 3 3

str(data2)
#'data.frame':  4 obs. of  3 variables:
# $ chid     : int  1 1 1 1
# $ unique_id: num  3e+09 3e+09 3e+09 3e+09
# $ alt      : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2

一般的解决方案是根据class分离数据。

data1 <- Filter(function(x) all(class(x) != "data.frame"), data)
data2 <- data$idx
#Or maybe we can generalise this as well
#data2 <- Filter(function(x) any(class(x) == "data.frame"), data)
str(data1)
#Classes ‘dfidx_mlogit’, ‘dfidx’, ‘mlogit.data’ and 'data.frame': 4 obs. of  2 variables:
# $ EDUC  : int  4 4 4 4
# $ HEALTH: int  3 3 3 3
str(data2)
#Classes ‘idx’ and 'data.frame':    4 obs. of  3 variables:
# $ chid     : int  1 1 1 1
# $ unique_id: num  3e+09 3e+09 3e+09 3e+09
# $ alt      : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2

最新更新