r语言 - 如何在某些条件下删除重复



这是我尝试在不同的数据集上执行的操作的示例,但这仍然不起作用

PORT    STATUS   VESSEL         DWT      IMP/EXP    QTY (Mts)
1 KANDLA    SAILED  CAPTAIN HAMADA  7938 EXP   4500
2 KAKINADA  EXPECTED CELON BREEZE       IMP      30000
3  KAKINADA BERTH    CELON BREEZE       IMP     3000
4 KAKINADA  SAILED   CELON BREEZE       IMP     30000
5 KANDLA    ANCHORAGE CAPTAIN HAMADA    EXP  4500
6 KAKINADA  BERTH    CELON BREEZE       IMP     30000

我想将一行(端口,船只,IMP/EXP(与另一行进行比较,如果匹配则删除,例如如果行中的IMP/EXP是" IMP",则按优先级顺序删除该行状态: 预计已航行>泊位>锚地> 它将给予航行=状态和其他有锚定并删除第二行的最高优先级,因为它将数量,港口,船舶与第四行匹配。 依此类推,如果条件匹配,请参阅

1 ) status=sailed and other have berth ,it will delete berth row
2) sailed and other have expected,it will delete expected row
3)if some row have berth and other have anchorage will delete anchorage
4)if some has expected=STATUS & other row have sailed=STATUS it will delete              
"expected"=STATUS   row        

等等 行应符合条件,即数量,端口,船舶根据条件删除行

对于 IMP/EXP 中的 EXP,它应该与条件匹配,即数量、港口、船舶
状态中的优先级条件:

priority- sailed>anchorage>expected>  berth

输出应该是

PORT    STATUS   VESSEL              DWT    IMP/EXP QTY (Mts)
1 KANDLA    SAILED  CAPTAIN HAMADA  7938         EXP    4500
3  KAKINADA BERTH    CELON BREEZE             IMP      3000
4 KAKINADA  SAILED   CELON BREEZE             IMP      30000

删除第 2、5、6 行是所需的输出

首先,您需要将数据读入 data.frame 中的 R 中。数据帧test应如下所示:

>test
#      PORT    STATUS         VESSEL  DWT IMPEXP   QTY
#1   KANDLA    SAILED CAPTAIN HAMADA 7938    EXP  4500
#2 KAKINADA  EXPECTED   CELON BREEZE   NA    IMP 30000
#3 KAKINADA     BERTH   CELON BREEZE   NA    IMP  3000
#4 KAKINADA    SAILED   CELON BREEZE   NA    IMP 30000
#5   KANDLA ANCHORAGE CAPTAIN HAMADA   NA    EXP  4500
#6 KAKINADA     BERTH   CELON BREEZE   NA    IMP 30000

使用plyr包的ddply功能,您应该能够在跟随功能的帮助下获得所需的输出。

ddply(test,.variables = c("PORT","VESSEL","IMPEXP","QTY"),
function(t){if(t$IMPEXP[1]=="IMP"){
t$STATUS<-factor(x = t$STATUS,levels =c("EXPECTED","ANCHORAGE","BERTH","SAILED"),ordered = T)
return(t[which.max(as.integer(t$STATUS)),])
}else{
t$STATUS<-factor(x = t$STATUS,levels =c("BERTH","EXPECTED","ANCHORAGE","SAILED"),ordered = T)
return(t[which.max(as.integer(t$STATUS)),])}
}
)
#PORT STATUS         VESSEL  DWT IMPEXP   QTY
#1 KAKINADA  BERTH   CELON BREEZE   NA    IMP  3000
#2 KAKINADA SAILED   CELON BREEZE   NA    IMP 30000
#3   KANDLA SAILED CAPTAIN HAMADA 7938    EXP  4500

相关内容

  • 没有找到相关文章

最新更新