R -对多个变量使用聚合并组合唯一的值



假设我有一个数据帧cars,前几行是:

Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011

我想使用聚合首先找到变量"Brand"one_answers"Type"的每个组合,然后找到每个组合的唯一年数。例如,期望的输出如下:

Brand        Type         num_unique_years
BMW          Compact      2(which are 2009, 2008)
BMW          Sedan        1(2010)
Benz         Compact      1(2012)
Benz         Sedan        2(2010,2011)

步骤基本如下:

x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2

但是,我不知道如何将这些单独的步骤组合在一个函数中。

Thanks for help

只需定义适当的聚合函数并使用aggregate。没有使用包。

len_years <- function(years) {
  u <- unique(sort(years))
  paste0(length(u), "(", toString(u), ")")
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"
给:

> Ag
  Brand    Type num_unique_years
1  BENZ Compact          1(2012)
2   BMW Compact    2(2008, 2009)
3  BENZ   Sedan    2(2010, 2011)
4   BMW   Sedan          1(2010)

1)如果不需要年份本身,则将函数替换为

len_years <- function(years) length(unique(years))

2)或者,将aggregate语句和下一条语句替换为:

Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)

注:可复制形式的输入cars为:

Lines <- "Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011"
cars <- read.table(text = Lines, header = TRUE)

我可以用data.table:

两步完成
library(data.table)
dt <- data.table(brand = c("BMW", "BMW", "BENZ", "BENZ", "BMW", "BENZ"),
                 type = c("Compact", "Sedan", "Sedan", "Compact", "Compact", "Sedan"),
                 year = c(2009, 2010, 2010, 2012, 2008, 2011))

dt[ , num_unique_years := length(unique(year)), by = .(brand, type)]
unique(dt[, .(type, brand, num_unique_years)])

最终结果:

      type brand num_unique_years
1: Compact   BMW                2
2:   Sedan   BMW                1
3:   Sedan  BENZ                2
4: Compact  BENZ                1

如何使用dplyr:

library(dplyr) count(group_by(count(group_by(cars,Brand,Type, Year)),Brand,Type))

最新更新