如何跟踪数据框中的重复行,当按唯一(df)减少时?

  • 本文关键字:唯一 df 跟踪 何跟踪 数据 r
  • 更新时间 :
  • 英文 :


这是这个问题的后续问题。

想象一下以下数据框:

a <- c(rep("A", 3), rep("B", 3), rep("A",2))
b <- c(1,1,2,4,1,1,2,2)
df <-data.frame(a,b)

这给了

a b
1 A 1
2 A 1
3 A 2
4 B 4
5 B 1
6 B 1
7 A 2
8 A 2

我通过以下方式将其减少为唯一的行:

df_unique <- unique(df)

现在,我想知道如何跟踪合并的行。我想创建一个新列,其中每个组件都有一个已合并的行名列表。如下所示:

df_unique_informative =   
a b track
1 A 1 [1,2]
3 A 2 [3,7,8]
4 B 4 [4]
5 B 1 [5,6]
res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x) x)
# OR perhaps you want
#res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x)
#                                                                paste(x, collapse = ", "))
res
#  a b   track
#1 A 1    1, 2
#2 B 1    5, 6
#3 A 2 3, 7, 8
#4 B 4       4
#Shorter code
res = aggregate(list(track = 1:NROW(df)), df[,1:2], '[')

更新

a <- c(rep("A", 3), rep("B", 3), rep("A",2))
b <- c(1,1,2,4,1,1,2,2)
c = letters[1:8]
df <-data.frame(a,b,c, stringsAsFactors = FALSE)
res = aggregate(x = list(track = 1:NROW(df)), by = list(a = df$a, b = df$b), function(x) df$c[x])
res
#  a b   track
#1 A 1    a, b
#2 B 1    e, f
#3 A 2 c, g, h
#4 B 4       d

这是一个带有tidyverse的选项

library(tidyverse)
rownames_to_column(df, 'rn') %>% 
group_by(a, b) %>% 
summarise(track = list(rn))

最新更新