如果你看到我的个人资料,我所有的问题都在数据帧上,这是另一个!
我有一个特定的数据帧,它是借方和贷方交易之间合并的结果
>head(allTxns)
Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
3 12345 2014-10-01 200 2014-10-15 800
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
我的目标是获取这些行,其中 DebitDate 位于 CreditDate 的 5 天之间,因此我尝试对数据进行子集化,我使用 :
运算符放置日期范围
FiveDays <- allTxns$CreditDate+5 #Results in a vector which has date + 5 days
allTxns <- cbind(allTxns[1:2],FiveDays,allTxns[4:6]) #Adding the vector as a column of dataframe
newDf <- allTxns[allTxns$DebitDate %in% allTxns$CreditDate:allTxns$FiveDays]
在上面的代码中,我收到以下逻辑错误,其中仅使用第一个元素
Warning messages:
1: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
2: In mer32$DepositDate:mer32$FiveDays2 :
numerical expression has 3994 elements: only the first used
因此,我所需的输出仅限于前Cust_no (12345(,而不适用于其他行。如何确保范围条件应用于所有行?
输出不正确
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
8 66554 2014-10-10 660 2014-10-04 150
9 66554 2014-10-10 660 2014-10-05 800
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
正确的输出
>head(newDf)
row.names Cust_no CreditDate Credit DebitDate Debit
1 12345 2014-10-01 200 2014-10-03 400
2 12345 2014-10-01 200 2014-10-04 150
4 33344 2014-10-03 500 2014-10-04 50
5 33344 2014-10-03 500 2014-10-05 504
6 33344 2014-10-03 500 2014-10-06 332
7 33344 2014-10-03 500 2014-10-08 56
10 66554 2014-10-10 660 2014-10-11 400
11 66554 2014-10-10 660 2014-10-12 150
12 66554 2014-10-10 660 2014-10-13 800
尝试
allTxns[with(allTxns , CreditDate < DebitDate & DebitDate <=FiveDays),]
# Cust_no CreditDate FiveDays Credit DebitDate Debit
#1 12345 2014-10-01 2014-10-06 200 2014-10-03 400
#2 12345 2014-10-01 2014-10-06 200 2014-10-04 150
#4 33344 2014-10-03 2014-10-08 500 2014-10-04 50
#5 33344 2014-10-03 2014-10-08 500 2014-10-05 504
#6 33344 2014-10-03 2014-10-08 500 2014-10-06 332
#7 33344 2014-10-03 2014-10-08 500 2014-10-08 56
#10 66554 2014-10-10 2014-10-15 660 2014-10-11 400
#11 66554 2014-10-10 2014-10-15 660 2014-10-12 150
#12 66554 2014-10-10 2014-10-15 660 2014-10-13 800
这个老问题已经有了公认的答案。但是,我注意到问题和答案可以简化,因为没有必要创建额外的FiveDays
列:
allTxns[with(allTxns, CreditDate <= DebitDate & DebitDate <= CreditDate + 5L), ]
Cust_no CreditDate Credit DebitDate Debit 1 12345 2014-10-01 200 2014-10-03 400 2 12345 2014-10-01 200 2014-10-04 150 4 33344 2014-10-03 500 2014-10-04 50 5 33344 2014-10-03 500 2014-10-05 504 6 33344 2014-10-03 500 2014-10-06 332 7 33344 2014-10-03 500 2014-10-08 56 10 66554 2014-10-10 660 2014-10-11 400 11 66554 2014-10-10 660 2014-10-12 150 12 66554 2014-10-10 660 2014-10-13 800