如何比较和制表R中两个向量之间公共元素的频率



我有两个具有共同和重复元素的向量。我想要一个表格来比较两个向量中公共元素的频率。这是子集

plyr::count(V1)
          x freq
1  A*02:01  106
2  A*02:02   88
3  A*03:01   95
4  A*03:02   60
plyr::count(V2)
   x freq
1  A*02:01   11
2  A*02:02   11
3  A*02:04    1
4  A*03:01   20

我想要的输出是:

   x  freq.V1  freq.V2
1  A*02:01    106     11 

2  A*02:02     88     11

3  A*03:01     60     20

我认为merge在这里似乎是一个不错的选择,因为默认设置是保持两个数据集的观测值相同。所以以下内容应该有效

merge(plyr::count(V1), plyr::count(V2), by="x")

工作示例

plyr::count(mtcars$gear)
#   x freq
# 1 3   15
# 2 4   12
# 3 5    5
plyr::count(mtcars$gear[1:10])
#   x freq
# 1 3    4
# 2 4    6
merge(
plyr::count(mtcars$gear),
plyr::count(mtcars$gear[1:10]), 
by="x")
#   x freq.x freq.y
# 1 3     15      4
# 2 4     12      6

只需使用 table

tbl1 <- table(V1[V1 %in% (int <- intersect(unique(V1), unique(V2)))])
tbl2 <- table(V2[V2 %in% int])
data.frame(x = names(tbl1), freq.V1 = as.vector(tbl1), freq.V2 = as.vector(tbl2))

或者我最喜欢的,data.table

library(data.table)
DT <- data.table(V1 = V1, V2 = V2)
DT[V1 %in% unique(V2), .(freq.V1 = .N), by = .(x = V1)
   ][DT[V2 %in% unique(V1), .N, by = .(x = V2)],
     freq.V2 := i.N, on = "x", nomatch = 0L]

当然,如果您事先知道V1V2由同一组元素组成,那么这两个选项看起来都简单得多:

data.frame(x = names(tbl1 <- table(V1)), freq.V1 = as.vector(tbl1),
           freq.V2 = as.vector(table(V2)))

DT[ , .(freq.V1 = .N), by = .(x = V1)
   ][DT[ , .(freq.V2 = .N), by = .(x = V2)], on = "x"]

相关内容

  • 没有找到相关文章

最新更新