我想知道是否有一个更快,更优雅的数据。
假设我们有两个数据集
set.seed(1)
library(data.table)
DT1 <- data.table(income = runif(10, 0,1),
ID = 1:10)
DT2 <- data.table(height = runif(20,0,1),
weight = runif(20,0,1),
V1 = runif(20,0,1),
V2 = runif(20,0,2),
type = rep(c("Parents", "Children"),10))
DT2 <- DT2[order(type)][, ID := rep(1:10,2)]
第一个数据集是一个"家庭"。level dataset with household identifiers with ID, from 1:10.
有第二个数据集DT2
,对于每个家庭ID,每个父母和孩子都有四个变量。我要做的是合并所有的变量(高度,权值,V1, V2)父结点和子结点的每一行/DT1的观测值。因此,我们将有8个变量要合并,4个用于父节点,4个用于子节点。
DT1[DT2[type == "Parents"], c("height_parents", "weight_parents",
"V1_parents", "V2_parents") := list(i.height, i.weight,
i.V1, i.V2), on = c(ID = "ID")]
DT1[DT2[type == "Children"], c("height_children", "weight_children",
"V1_children", "V2_children") := list(i.height, i.weight,
i.V1, i.V2), on = c(ID = "ID")]
输出如下:
income ID height_parents weight_parents V1_parents V2_parents height_children
1: 0.70647001 1 0.49163534 0.164385214 0.6806198 1.72937701 0.04655907
2: 0.07658058 2 0.06776809 0.234182275 0.4456820 1.76814822 0.45665042
3: 0.49770601 3 0.23255515 0.709256017 0.3514867 1.83387012 0.24395311
4: 0.51944306 4 0.30555999 0.974742471 0.0529102 0.06094086 0.22356168
5: 0.23075737 5 0.51104028 0.007269433 0.4157508 1.00207079 0.98915308
6: 0.86449990 6 0.75420198 0.211342425 0.9837331 0.03520897 0.86080818
weight_children V1_children V2_Children
1: 0.3398447 0.7454582 0.6761706
2: 0.8475106 0.4716267 1.5231691
3: 0.8895790 0.1561395 1.8721056
4: 0.3503219 0.2663775 0.2408758
5: 0.3902352 0.9332958 1.3532260
6: 0.7648748 0.6969372 1.3289579
注意,如果我有许多不同的"类型"(这里只有两个)和/或许多不同的变量要合并(这里只有4个),上面的内容很快就会变得费力和冗长。我使用的数据集有许多不同的变量和类型。因此,我希望以更有效的方式做到这一点。特别地,我希望能够定义一个向量:
merge_variables = c("height", "weight", "V1", "V2")
和在数据中。表的样式,将父节点和子节点的所有变量合并。我希望新的变量有一个下划线在他们的类型名称(例如height_parents
和height_children
)。
我希望我已经把我的要求说清楚了。
谢谢!
我们可以使用mget
来做到这一点,即使用paste
创建对象名称作为字符串,然后在subsetting 'DT2'后加入on
的'ID',并分配列的输出(即mget
返回向量的list
)以在'DT1'中创建新列
merge_variables = c("height", "weight", "V1", "V2")
parent_nm <- paste0(merge_variables, "_parents")
child_nm <- paste0(merge_variables, "_children")
DT1[DT2[type == "Parents"], (parent_nm) := mget(paste0("i.", merge_variables)), on = .(ID)]
DT1[DT2[type == "Children"], (child_nm) := mget(paste0("i.", merge_variables)), on = .(ID)]
与产出
> DT1
income ID height_parents weight_parents V1_parents V2_parents height_children weight_children V1_children V2_children
1: 0.26550866 1 0.20597457 0.4820801 0.47761962 0.6781459 0.1765568 0.5995658 0.86120948 1.6788807
2: 0.37212390 2 0.68702285 0.4935413 0.43809711 0.6933670 0.3841037 0.1862176 0.24479728 0.6675499
3: 0.57285336 3 0.76984142 0.8273733 0.07067905 0.9527025 0.4976992 0.6684667 0.09946616 1.7843967
4: 0.90820779 4 0.71761851 0.7942399 0.31627171 1.7286789 0.9919061 0.1079436 0.51863426 0.7799791
5: 0.20168193 5 0.38003518 0.7237109 0.66200508 1.5546414 0.7774452 0.4112744 0.40683019 1.9212360
6: 0.89838968 6 0.93470523 0.8209463 0.91287592 0.8693190 0.2121425 0.6470602 0.29360337 1.4250294
7: 0.94467527 7 0.65167377 0.7829328 0.45906573 0.7999887 0.1255551 0.5530363 0.33239467 0.6507043
8: 0.66079779 8 0.26722067 0.5297196 0.65087047 1.5141743 0.3861141 0.7893562 0.25801678 0.4053845
9: 0.62911404 9 0.01339033 0.0233312 0.47854525 1.4222424 0.3823880 0.4772301 0.76631067 0.2433838
10: 0.06178627 10 0.86969085 0.7323137 0.08424691 0.4909770 0.3403490 0.6927316 0.87532133 0.2866088
或者另一种选择是dcast
'DT2'到'wide'格式并进行连接(但是,这里我们没有分配(:=
),因此它不会更新原始DT1)
DT1[dcast(DT2, ID ~ type, value.var = merge_variables), on = .(ID)]
income ID height_Children height_Parents weight_Children weight_Parents V1_Children V1_Parents V2_Children V2_Parents
1: 0.26550866 1 0.1765568 0.20597457 0.5995658 0.4820801 0.86120948 0.47761962 1.6788807 0.6781459
2: 0.37212390 2 0.3841037 0.68702285 0.1862176 0.4935413 0.24479728 0.43809711 0.6675499 0.6933670
3: 0.57285336 3 0.4976992 0.76984142 0.6684667 0.8273733 0.09946616 0.07067905 1.7843967 0.9527025
4: 0.90820779 4 0.9919061 0.71761851 0.1079436 0.7942399 0.51863426 0.31627171 0.7799791 1.7286789
5: 0.20168193 5 0.7774452 0.38003518 0.4112744 0.7237109 0.40683019 0.66200508 1.9212360 1.5546414
6: 0.89838968 6 0.2121425 0.93470523 0.6470602 0.8209463 0.29360337 0.91287592 1.4250294 0.8693190
7: 0.94467527 7 0.1255551 0.65167377 0.5530363 0.7829328 0.33239467 0.45906573 0.6507043 0.7999887
8: 0.66079779 8 0.3861141 0.26722067 0.7893562 0.5297196 0.25801678 0.65087047 0.4053845 1.5141743
9: 0.62911404 9 0.3823880 0.01339033 0.4772301 0.0233312 0.76631067 0.47854525 0.2433838 1.4222424
10: 0.06178627 10 0.3403490 0.86969085 0.6927316 0.7323137 0.87532133 0.08424691 0.2866088 0.4909770