r语言 - 如何根据列在两者的所有可能的组合乘两个标题的列?



我有两个标题,像这样:

library(dplyr)
my_tib1 <- tibble(feature1 = c("A", "A", "B", "B", "C", "C"), feature2 = c("AA", "BB", "AA", "BB", "AA", "BB"), number = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6))
my_tib2 <- tibble(feature3 = c("TT", "TT", "FF", "FF"), feature2 = c("AA", "BB", "AA", "BB"), number = c(0.6, 0.4, 0.3, 0.8))

看起来像这样:

# A tibble: 6 × 3
feature1 feature2 number
<chr>    <chr>     <dbl>
1 A        AA          0.1
2 A        BB          0.1
3 B        AA          0.3
4 B        BB          0.4
5 C        AA          0.05
6 C        BB          0.05
# A tibble: 4 × 3
feature3 feature2 number
<chr>    <chr>     <dbl>
1 TT       AA          0.1
2 TT       BB          0.4
3 FF       AA          0.3
4 FF       BB          0.2

注意,feature2在两个标题中具有相同的类别。对于my_tib1中的feature1feature2, my_tib2中的feature2feature3,每种组合的number都是唯一的。

对于上下文:number列表示边际概率,我想将边际分布相乘以得到联合分布(我知道这些假设)。

我认为这需要得到特征1、特征2和特征3的所有可能组合,并将它们的number乘以一个新的标题列。生成的标题的长度应该是:3 x feature1, 2 x feature2, 2 x feature3.

最后的标题应该像这样:

# A tibble: 12 × 6
feature1 feature2 feature3  number.x  number.y  number.mult
<chr>    <chr>    <chr>     <dbl>     <dbl>     <dbl>
1 A        AA       TT        0.1       0.1       0.01
2 A        AA       FF        0.1       0.4       0.04
...

用number表示数字。

我试过以下方法,我想我已经接近了,但它不太奏效:

my_tib1 %>% full_join(my_tib2, by = "feature2") %>% mutate(number.mult = number.x*number.y)

这只是给了我我正在寻找的12x6的标尺,但数字在数字。

library(data.table)
# convert to data.table format
setDT(my_tib1); setDT(my_tib2)
# create all unique combinations
DT <- CJ(ft1 = my_tib1$feature1, 
ft2 = my_tib1$feature2, 
ft3 = my_tib2$feature3, unique = TRUE)
# join relevant data
DT[my_tib1, `:=`(number.x = i.number), on = .(ft1 = feature1, ft2 = feature2)]
DT[my_tib2, `:=`(number.y = i.number), on = .(ft3 = feature3, ft2 = feature2)]
# final computation
DT[, number.mult := number.x * number.y][]
#    ft1 ft2 ft3 number.x number.y number.mult
# 1:   A  AA  FF      0.1      0.3        0.03
# 2:   A  AA  TT      0.1      0.6        0.06
# 3:   A  BB  FF      0.2      0.8        0.16
# 4:   A  BB  TT      0.2      0.4        0.08
# 5:   B  AA  FF      0.3      0.3        0.09
# 6:   B  AA  TT      0.3      0.6        0.18
# 7:   B  BB  FF      0.4      0.8        0.32
# 8:   B  BB  TT      0.4      0.4        0.16
# 9:   C  AA  FF      0.5      0.3        0.15
#10:   C  AA  TT      0.5      0.6        0.30
#11:   C  BB  FF      0.6      0.8        0.48
#12:   C  BB  TT      0.6      0.4        0.24