我想在R数据帧中对客户事务进行索引,这样我就可以很容易地识别,比如说,特定客户进行的第三笔事务。例如,如果我有以下数据帧(按客户和交易日期排序):
transactions = data.frame(CUST.ID = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
DATE = as.Date(c("2009-07-02", "2013-08-15", "2010-01-02", "2004-03-05",
"2006-02-03", "2007-01-01", "2004-03-05", "2006-02-03", "2007-01-01")),
AMOUNT = c(5, 9, 21, 34, 76, 1, 100, 23, 10))
> transactions
CUST.ID DATE AMOUNT
1 1 2009-07-02 5
2 1 2013-08-15 9
3 2 2010-01-02 21
4 2 2004-03-05 34
5 2 2006-02-03 76
6 2 2007-01-01 1
7 3 2004-03-05 100
8 3 2006-02-03 23
9 3 2007-01-01 10
我可以清楚地看到,客户1进行了2笔交易,客户2进行了4笔交易,等等。
我想要的是按客户对这些事务进行索引,在我的数据帧中创建一个新列。以下代码实现了我想要的:
transactions$COUNTER = 1
transactions$CUSTOMER.TRANS.NO = unlist(aggregate(COUNTER ~ CUST.ID,
data = transactions,
function(x) {rank(x, ties.method = "first")})[, 2])
transactions$COUNTER = NULL
> transactions
CUST.ID DATE AMOUNT CUSTOMER.TRANS.NO
1 1 2009-07-02 5 1
2 1 2013-08-15 9 2
3 2 2010-01-02 21 1
4 2 2004-03-05 34 2
5 2 2006-02-03 76 3
6 2 2007-01-01 1 4
7 3 2004-03-05 100 1
8 3 2006-02-03 23 2
9 3 2007-01-01 10 3
现在,每个客户的第一笔交易标记为1,第二笔标记为2等。
所以我得到了我想要的,但这是一段非常糟糕的代码,创建一个列表并进行分离,太难看了。有没有比我更有经验的人能够想出更好的解决方案?
因为您已经花了很大的精力发布了您尝试的示例代码(使您的问题成为比我链接的重复问题更好的堆栈溢出问题),我将在这里总结选项:
ave
within(transactions, { Trans.No <- ave(CUST.ID, CUST.ID, FUN = seq_along) })
getanID
library(splitstackshape)
getanID(transactions, "CUST.ID")
rle
## Depends on your data being sorted
transactions$Trans.No <- sequence(rle(transactions$CUST.ID)$lengths)
data.table
library(data.table)
DT <- data.table(transactions)
DT[, .id := sequence(.N), by = "CUST.ID"]
library(plyr)
ddply(transactions,.(CUST.ID),transform,CUSTOMER.TRANS.NO=seq(1,length(CUST.ID),1))
CUST.ID DATE AMOUNT CUSTOMER.TRANS.NO
1 1 2009-07-02 5 1
2 1 2013-08-15 9 2
3 2 2010-01-02 21 1
4 2 2004-03-05 34 2
5 2 2006-02-03 76 3
6 2 2007-01-01 1 4
7 3 2004-03-05 100 1
8 3 2006-02-03 23 2
9 3 2007-01-01 10 3