我的数据框架类似于我在下面创建的数据框架(用于插图)。对于具有重复ID的帐户(在我的ID下方的示例中,我是名称,但也可能是一个数字),我想编写一些代码,删除重复ID条目中的闭合值与打开的值匹配的那些行。例如,下三行是属于约翰的3个不同帐户(ID列中的"约翰"的重复ID)。(这三组中的前两个)都在2017年9月30日关闭(与第三个的打开值匹配),因此应将其从输出数据框架中删除。玛丽(Mary)也是如此(她的两个帐户之一的封闭日期与另一个帐户的开放日期相匹配,因此应删除关闭的日期)。但是,对于Jack和Pete而言,他们的两个帐户都应保存在输出数据框架中,因为(在每种情况下),截止日期不符合打开的日期。所有没有任何重复ID的行(例如Jill,Jane,Alice)也都保存在输出数据框架中。
我有以下代码使用DPLYR通过重复的ID过滤。
Input_DF_Dupl_ID <- Input_DF %>%
group_by(ID) %>%
filter(n() > 1) %>%
arrange(ID)
但是,它仅标识并安排重复的帐户 - 它并没有继续删除符合上面定义的标准的帐户。此外,我实际上不想删除(过滤掉)未删除的帐户。
我希望这很清楚,感谢我能得到的所有帮助。,谢谢...
input_df:
Date ID Opened Closed Review Status Type Paid
09/30/2017 John 09/21/2016 09/30/2017 09/30/2019 Closed A 1000
09/30/2017 John 06/19/2015 09/30/2017 06/30/2020 Closed A 2500
09/30/2017 John 09/30/2017 14/31/2022 Open A 0
09/30/2017 Jill 11/10/2014 07/31/2018 Open B 0
09/30/2017 Jane 07/15/2012 09/30/2017 07/31/2017 Closed C 10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Mary 11/10/2014 09/30/2017 07/31/2018 Closed B 12000
09/30/2017 Mary 09/30/2017 07/31/2022 Open B 0
09/30/2017 Jack 06/19/2011 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Jack 03/19/2015 06/30/2020 Open A 0
09/30/2017 Pete 07/15/2012 05/31/2015 07/31/2017 Closed B 0
09/30/2017 Pete 12/22/2016 07/31/2017 Open C 0
所需的output_df:
Date ID Opened Closed Review Status Type Paid
09/30/2017 John 09/30/2017 14/31/2022 Open A 0
09/30/2017 Jill 11/10/2014 07/31/2018 Open B 0
09/30/2017 Jane 07/15/2012 09/30/2017 07/31/2017 Closed C 10999
09/30/2017 Alice 06/19/2015 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Mary 09/30/2017 07/31/2022 Open B 0
09/30/2017 Jack 06/19/2011 09/30/2017 06/30/2020 Closed A 500
09/30/2017 Jack 03/19/2015 06/30/2020 Open A 0
09/30/2017 Pete 07/15/2012 05/31/2015 07/31/2017 Closed B 0
09/30/2017 Pete 12/22/2016 07/31/2017 Open C 0
请使用以下代码。编辑仅适用于大小大于一个记录
的组的条件library(dplyr)
Input_DF_Dupl_ID <- Input_DF %>%
group_by(ID) %>%
filter(!(Status == "Closed" & Closed %in% Open & n()>1)) %>%
arrange(ID)