我正在尝试为 data.tabledt
计算新列的值。计算的一部分来自data.framedf
(也可以是一个data.table,到目前为止我不需要它(。
如果因子水平(此处:sample
(匹配,如何使用来自两个不同对象的值来计算新列?我曾经合并两个对象并逐行进行,但这会导致大量冗余数据。
这是 data.frame,它只有 10 行:
df
sample scaling_factor
A1 A1 111956565
A2 A2 89869320
A3 A3 120925219
A4 A4 111757559
A5 A5 77319341
A6 A6 89403194
A7 A7 150214981
B8 B8 133885925
B9 B9 86536587
B10 B10 123574939
df <- structure(list(sample = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
9L, 10L, 8L), .Label = c("A1", "A2", "A3", "A4", "A5", "A6",
"A7", "B10", "B8", "B9"), class = "factor"), scaling_factor = c(111956565.427018,
89869319.9348599, 120925219.4453, 111757558.886234, 77319340.5841949,
89403194.1170576, 150214980.784589, 133885925.080984, 86536586.7136393,
123574939.026597)), .Names = c("sample", "scaling_factor"), class = "data.frame", row.names = c("A1",
"A2", "A3", "A4", "A5", "A6", "A7", "B8", "B9", "B10"))
这是 data.table,每个样本有数十万行(dput 在输出中遇到<
问题,所以这里不提供(:
setDT(dt)
sample contig_id product_reads_rpk
1: A1 contig_10 2000.00000
2: A1 contig_100 24.27184
3: A1 contig_1000 1713.90374
4: A1 contig_10000 2900.66225
5: A1 contig_100003 1713.94231
6: A1 contig_100004 8575.23511
7: A1 contig_100004 11059.32203
8: A2 contig_100009 6923.67400
9: A2 contig_100010 1285.30259
10: A2 contig_100015 84.74576
dt[,product_rpm := product_reads_rpk/(df$scaling_factor/1000000), by = sample]
我正在尝试根据df
中每个样本的相应值在 dt 中生成一个新的列product_rpm
.我该怎么做?我得到了longer object length is not a multiple of shorter object length
但较短的对象长度是 1,例如A1
在DF中,对吧?
我不知道在不实际合并两个数据集的情况下执行此操作的方法 - 但是如果您使用合并数据集的data.table
方式,则可以避免创建冗余列。
因此,在您的情况下,它只是:
df <- data.table(df)
dt[df, product_rpm := (product_reads_rpk/scaling_factor/1000000), on = "sample"]
一个简单的例子:
library(data.table)
dt1 <- data.table(id = sample(1000:9999, size = 100),
size = sample(10000:99999, size = 100))
dt2 <- data.table(id = rep(dt1$id, 10),
group = rep(LETTERS[1:5], 10),
value = sample(1000:9999, size = 100 * 10, replace = T))
dt3 <- dt2[dt1, metric:= (value / size), on = "id"]
head(dt3)