r语言 - 如何从每个账户的第一个交易日期开始单独设置特定期限?



这是我的交易数据:

id          from_id        to_id      amount    date_trx
<fctr>      <fctr>         <fctr>     <dbl>     <date>
0           7468           5695       700.0     2005-01-04
1           6213           9379       11832.0   2005-01-08
2           7517           8170       1000.0    2005-01-10
3           6143           9845       4276.0    2005-01-12
4           6254           9640       200.0     2005-01-14
5           6669           5815       200.0     2005-01-20
6           6934           8583       49752.0   2005-01-24
7           9240           8314       19961.0   2005-01-26
8           6374           8865       1000.0    2005-01-30
9           6143           6530       13.4      2005-01-31
...

我想根据时间间隔构建新功能。

让我们看看这个:

id               from_id        to_id      amount       date_trx
<fctr>           <fctr>         <fctr>     <dbl>        <date>
149431           5370           5735       1000.0       2007-03-24
157403           5370           7058       3679.0       2007-04-13
158831           5370           8667       12600.0      2007-04-23
162680           5370           6053       19.2         2007-04-30
167082           5370           8165       3679.0       2007-05-13
168562           5370           5656       2100.0       2007-05-23
172578           5370           5929       79.0         2007-05-31
177507           5370           6725       3679.0       2007-06-01
179167           5370           8433       200.0        2007-06-22
183499           5370           7022       100.6        2007-06-29
...

假设我想计算每个账户的交易金额,例如,以周为单位.
所以,从2007-03-24开始,5370的每周交易金额历史如下:

in the 1st week(2007-03-24 - 2007-03-31):  1000.0          
in the 2nd week(2007-03-31 - 2007-04-07):  0.0          
in the 3rd week(2007-04-07 - 2007-04-14):  3679.0          
in the 4th week(2007-04-14 - 2007-04-21):  0.0          
in the 5th week(2007-04-21 - 2007-04-28):  12600.0          
in the 6th week(2007-04-28 - 2007-05-05):  19.2          
in the 7th week(2007-05-05 - 2007-05-12):  0.0          
in the 8th week(2007-05-12 - 2007-05-19):  3679.0          
in the 9th week(2007-05-19 - 2007-05-26):  2100.0          
in the 10th week(2007-05-26 - 2007-06-02):  79.0 + 3679.0 = 3758.0          
in the 11th week(2007-06-02 - 2007-06-09):  0.0          
in the 12th week(2007-06-09 - 2007-06-16):  0.0          
in the 13th week(2007-06-16 - 2007-06-23):  200.0          
in the 14th week(2007-06-23 - 2007-06-30):  100.6          

在这里,我们看到一周内5370交易的最大金额为12600.0。所以,现在我想把这个度量看作是一个功能,比如max_of_weekly_transacted_amount.
同样,我想计算每个账户在一个月内交易的平均金额,并将其存储为另一个功能,比如mean_of_monthly_transacted_amount

我尝试了润滑功能floor_date

# Max of weekly transaction amount
data <- data %>% group_by(date_trx_week=floor_date(date_trx, "week"),from_id) %>% mutate(weekly_trx = sum(amount)) %>% 
group_by(from_id) %>% mutate(max_of_weekly_transacted_amount=max(weekly_trx))%>% 
select(-c(date_trx_week,weekly_trx))

# Mean of monthly transaction amount
data <- data %>% group_by(date_trx_month=floor_date(date_trx, "month"),from_id) %>% mutate(monthly_trx = sum(amount)) %>% 
group_by(from_id) %>% mutate(mean_of_monthly_transacted_amount=mean(monthly_trx))%>% 
select(-c(date_trx_month,monthly_trx))

我的数据中date_trx的日期变量以2005-01-01开头,以2010-12-31结尾。floor_date2005-01-02-2005-01-09开始一周周期,以2005-01-09-2005-01-16继续,依此类推。它从2005-01-01-2005-02-01开始一个月,以2005-02-01-2005-03-01继续,依此类推。并且此函数对每个帐户使用相同的时间段。

但我想根据每个帐户的第一个交易日期为每个帐户专门制作周期。因此,对于from_id = 5370第一个交易日期是2007-03-24.如果我想为5370做周周期,那就是2007-03-24 - 2007-03-312007-03-31 - 2007-04-07等等。 如果我想为5370制作月周期,它将是2007-03-24 - 2007-04-242007-04-24 - 2007-05-24,依此类推.
对于另一个帐户,周期会有所不同。那么,我该如何实现这一目标呢?如何从每个账户的第一个交易日期开始分别设置特定期限?

以下内容都在 Base-R 中。首先,定制周。这在第一个条目和其余条目之间的差异上使用 mod 7。希望有人可以使用这种方法,并使您成为更有效的data.table解决方案。

df$Week <- unsplit(tapply(df$date_trx,df$from_id, function(x) as.numeric((x-x[1])) %/% 7 ),df$from_id)

然后,如果您想按周按 ID 表示平均值,您可以

aggregate(amount ~ from_id + Week, df, mean)
from_id Week  amount
1     5370    0  1000.0
2     6143    0  4276.0
3     6213    0 11832.0
4     6254    0   200.0
5     6374    0  1000.0
6     6669    0   200.0
7     6934    0 49752.0
8     7468    0   700.0
9     7517    0  1000.0
10    9240    0 19961.0
11    5370    2  3679.0
12    6143    2    13.4
13    5370    4 12600.0
14    5370    5    19.2
15    5370    7  3679.0
16    5370    8  2100.0
17    5370    9  1879.0
18    5370   12   200.0
19    5370   13   100.6

当然,您可以将mean替换为max或任何其他功能。

相关内容

最新更新