计算与金融交易数据集中唯一账户的交互



我有一个关于金融交易数据集的问题:

Account_from  Account_to  Value 
1       1           2        25.0
2       1           3        30.0
3       2           1        28.0
4       2           3        10.0
5       2           3        12.0
6       3           1        40.0
7       3           1        30.0
8       3           1        20.0

每行表示一个事务。我想创建一个额外的列,其中包含一个变量,其中包含与每个唯一帐户的交互次数的信息。 它看起来像下面这样:

Account_from  Account_to  Value  Count_interactions_out  Count_interactions_in 
1       1           2        25.0           2                       2
2       1           3        30.0           2                       2
3       2           1        28.0           2                       1
4       2           3        10.0           2                       1
5       2           3        12.0           2                       1
6       3           1        40.0           1                       2
7       3           1        30.0           1                       2
8       3           1        20.0           1                       2

Account 3只与account 1相互作用,因此Count_interactions_out为1。然而,它接收来自account 1 and 2的交互作用,因此count_interactions_in为2。

如何将其应用于整个数据集?

谢谢

这是一种使用dplyr的方法

library(dplyr)
financial.data %>%
group_by(Account_from) %>%
mutate(Count_interactions_out = nlevels(factor(Account_to))) %>%
ungroup() %>%
group_by(Account_to) %>%
mutate(Count_interactions_in = nlevels(factor(Account_from))) %>%
ungroup()

这是使用基本 R 的解决方案,其中使用了ave()

df <- cbind(df, 
with(df, list(
Count_interactions_out = ave(Account_to,Account_from,FUN = function(x) length(unique(x))), 
Count_interactions_in = ave(Account_from,Account_to,FUN = function(x) length(unique(x)))[match(Account_from,Account_to,)])))

这样

> df
Account_from Account_to Value Count_interactions_out Count_interactions_in
1            1          2    25                      2                     2
2            1          3    30                      2                     2
3            2          1    28                      2                     1
4            2          3    10                      2                     1
5            2          3    12                      2                     1
6            3          1    40                      1                     2
7            3          1    30                      1                     2
8            3          1    20                      1                     2

df <- within(df, list(
Count_interactions_out <- ave(Account_to,Account_from,FUN = function(x) length(unique(x))), 
Count_interactions_in <- ave(Account_from,Account_to,FUN = function(x) length(unique(x)))[match(Account_from,Account_to,)]))

这样

> df
Account_from Account_to Value Count_interactions_in Count_interactions_out
1            1          2    25                     2                      2
2            1          3    30                     2                      2
3            2          1    28                     1                      2
4            2          3    10                     1                      2
5            2          3    12                     1                      2
6            3          1    40                     2                      1
7            3          1    30                     2                      1
8            3          1    20                     2                      1

最新更新