我有一个嵌入其中的数据帧,另一个数据帧
class(data)
[1] "dfidx_mlogit" "dfidx" "data.frame" "mlogit.data"
我正试图把这两个数据帧分开。一个包含有关健康和教育的相关数据,另一个包含个人id的信息,称为"idx"。
如何将两个数据帧完全分离?
以下是数据
data <- structure(list(EDUC = c(4L, 4L, 4L, 4L), HEALTH = c(3L, 3L, 3L,
3L), idx = structure(list(chid = c(1L, 1L, 1L, 1L), unique_id = c(3000175513,
3000175513, 3000175513, 3000175513), alt = structure(1:4, .Label = c("Bicycle",
"Car", "Metro", "Walking"), class = "factor")), ids = c(1, 1,
2), row.names = c(NA, 4L), class = c("idx", "data.frame"))), row.names = c(NA,
4L), class = c("dfidx_mlogit", "dfidx", "data.frame", "mlogit.data"
), idx = structure(list(chid = c(1L, 1L, 1L, 1L), unique_id = c(3000175513,
3000175513, 3000175513, 3000175513), alt = structure(1:4, .Label = c("Bicycle",
"Car", "Metro", "Walking"), class = "factor")), ids = c(1, 1,
2), row.names = c(NA, 4L), class = c("idx", "data.frame")))
如果我们想分离数据集,它是具有嵌套的"data.frame"的"id"列。我们可以pull
该列来创建新的对象
library(dplyr)
data2 <- data %>%
pull(idx)
data1 <- data %>%
as_tibble %>%
select(-idx)
attr(data1, "idx") <- NULL
-检查结构
str(data1)
#tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
# $ EDUC : int [1:4] 4 4 4 4
# $ HEALTH: int [1:4] 3 3 3 3
str(data2)
#Classes ‘idx’ and 'data.frame': 4 obs. of 3 variables:
# $ chid : int 1 1 1 1
# $ unique_id: num 3e+09 3e+09 3e+09 3e+09
# $ alt : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2
或在base R
中执行此操作
data2 <- data$idx
class(data2) <- 'data.frame'
data1 <- data[1:2]
-检查结构
str(data1)
#Classes ‘dfidx_mlogit’, ‘dfidx’, ‘mlogit.data’ and 'data.frame': 4 obs. of 2 variables:
# $ EDUC : int 4 4 4 4
# $ HEALTH: int 3 3 3 3
str(data2)
#'data.frame': 4 obs. of 3 variables:
# $ chid : int 1 1 1 1
# $ unique_id: num 3e+09 3e+09 3e+09 3e+09
# $ alt : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2
一般的解决方案是根据class
分离数据。
data1 <- Filter(function(x) all(class(x) != "data.frame"), data)
data2 <- data$idx
#Or maybe we can generalise this as well
#data2 <- Filter(function(x) any(class(x) == "data.frame"), data)
str(data1)
#Classes ‘dfidx_mlogit’, ‘dfidx’, ‘mlogit.data’ and 'data.frame': 4 obs. of 2 variables:
# $ EDUC : int 4 4 4 4
# $ HEALTH: int 3 3 3 3
str(data2)
#Classes ‘idx’ and 'data.frame': 4 obs. of 3 variables:
# $ chid : int 1 1 1 1
# $ unique_id: num 3e+09 3e+09 3e+09 3e+09
# $ alt : Factor w/ 4 levels "Bicycle","Car",..: 1 2 3 4
# - attr(*, "ids")= num [1:3] 1 1 2