R数据帧过滤:使用基于时间因素的唯一或重复函数



我试图过滤贷款数据框架,但是每个月报告都重复贷款,如果他们仍然未偿还,或者如果支付了贷款(不能只使用最新的月报告)。我希望通过贷方筛选贷款的唯一到期日,并删除重复项,并仅按报告日期保留最新的数据。下面是一个数据示例:

df <- data.frame(Reporting.date=c("6/30/2020","6/30/2020","6/30/2020","8/31/2021","8/31/2021"
,"8/31/2021","6/30/2020","7/31/2021","5/31/2020","12/31/2020")
, Lender.name=c("Lender1","Lender1","Lender1","Lender1","Lender1","Lender1"
,"Lender1","Lender1","Lender2","Lender2")
, Date.of.maturity=c("6/20/2025","6/20/2025","6/20/2025","6/20/2025","6/20/2025"
,"6/20/2025","6/30/2022","6/30/2022","5/15/2024","5/15/2024")
, Loan.amount=c(13129474,14643881,44935677,13129474,14643881,44935677
,150000,150000,2750000,2750000))

从示例数据中可以看到,Lender1有2个唯一的到期日。第一个到期日有3笔贷款在2个报告日期重复,第二个到期日有1笔贷款重复。我想删除重复项以保留最新的报告数据。我的希望是得到一个之后看起来像这样的数据帧:

14643881449356771500002750000
贷款人名称日期。贷款金额
13129474

您需要将Reporting.date转换为日期格式,无论是在mutate中(像我一样)还是直接在filter中。

library(tidyverse)
df %>%
mutate(Reporting.date = as.Date(Reporting.date, format = '%m/%d/%Y')) %>%
group_by(Lender.name, Date.of.maturity, Loan.amount) %>%
filter(Reporting.date == max(Reporting.date)) %>%
ungroup()

对于arrange我们也可以这样做

library(dplyr)
library(lubridate)
df %>%
arrange(Lender.name, Date.of.maturity, Loan.amount, 
desc(mdy(Reporting.date))) %>%
group_by(Lender.name, Date.of.maturity, Loan.amount) %>%
slice_head(n = 1) %>%
ungroup

与产出

# A tibble: 5 x 4
Reporting.date Lender.name Date.of.maturity Loan.amount
<chr>          <chr>       <chr>                  <dbl>
1 8/31/2021      Lender1     6/20/2025           13129474
2 8/31/2021      Lender1     6/20/2025           14643881
3 8/31/2021      Lender1     6/20/2025           44935677
4 7/31/2021      Lender1     6/30/2022             150000
5 12/31/2020     Lender2     5/15/2024            2750000

Base R选项使用subset,transformave-

subset(transform(df, Reporting.date = as.Date(Reporting.date, format = '%m/%d/%Y')), 
Reporting.date == ave(Reporting.date, Lender.name, Date.of.maturity, FUN = max))
#   Reporting.date Lender.name Date.of.maturity Loan.amount
#4      2021-08-31     Lender1        6/20/2025    13129474
#5      2021-08-31     Lender1        6/20/2025    14643881
#6      2021-08-31     Lender1        6/20/2025    44935677
#8      2021-07-31     Lender1        6/30/2022      150000
#10     2020-12-31     Lender2        5/15/2024     2750000

最新更新