R-根据类和日期的出现减少/过滤数据



i在不同区域中具有不同容器的数据集。我收到的数据输出记录了船只的名称,类型(例如钓鱼/货物(及其进入区域的时间,剩下的时间以及在区域/DOS中的持续时间/DOS只是距离的距离 - 或我正在寻找的区域at。

我的问题是,渔船经常进行横断面,一天将每天多次进入和退出区域,因此在我的报告输出中会多次注意到。

我想巩固捕鱼船的数据,以便每天注意到同名船(仅用于类型:捕鱼(,除了一个帐户外,所有帐户都将被删除。为简单起见,也许只是看着"区域日期首次看到",因为我认为当特定持续时间跨越多天时,它会变得更加复杂(我可以稍后再回到这个想法(。

虚拟数据:

 df <- structure(list(Name = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 
 3L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 8L, 
 8L, 9L), .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I"
 ), class = "factor"), Type = structure(c(2L, 2L, 2L, 2L, 2L, 
 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
 2L, 1L, 1L, 2L), .Label = c("Cargo", "Fishing"), class = "factor"), 
 `First seen inside` = structure(c(1556385360, 1556393640, 
 1556002200, 1556260260, 1556518860, 1556136660, 1556278500, 
 1556285820, 1556391480, 1556509620, 1556319480, 1556214120, 
 1556235600, 1556325540, 1556326920, 1556329500, 1556330220, 
 1556330580, 1556330880, 1556330940, 1556332980, 1556339880, 
 1556340900, 1556344140, 1556344500, 1556345220, 1556346420, 
 1556348220, 1556348520, 1556350860, 1556351460, 1556356620, 
 1556360220, 1556365920, 1556366520, 1556367180, 1556076420, 
 1556166900, 1556154840, 1556454900, 1556291220), class = c("POSIXct", 
 "POSIXt"), tzone = ""), `Last seen inside` = structure(c(34L, 
 35L, 1L, 8L, 38L, 3L, 7L, 9L, 36L, 38L, 27L, 4L, 5L, 10L, 
 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 
 23L, 24L, 25L, 26L, 28L, 29L, 30L, 31L, 32L, 33L, 2L, 6L, 
 37L, 38L, 38L), .Label = c("4/23/2019 14:27", "4/24/2019 21:23", 
 "4/25/2019 00:00", "4/25/2019 10:47", "4/25/2019 16:59", 
 "4/25/2019 23:49", "4/26/2019 05:17", "4/26/2019 13:39", 
 "4/26/2019 15:12", "4/26/2019 17:54", "4/26/2019 18:05", 
 "4/26/2019 18:51", "4/26/2019 19:00", "4/26/2019 19:06", 
 "4/26/2019 19:08", "4/26/2019 19:13", "4/26/2019 21:24", 
 "4/26/2019 21:38", "4/26/2019 22:02", "4/26/2019 22:51", 
 "4/26/2019 22:55", "4/26/2019 23:22", "4/26/2019 23:51", 
 "4/27/2019 00:00", "4/27/2019 00:36", "4/27/2019 00:42", 
 "4/27/2019 01:17", "4/27/2019 02:06", "4/27/2019 03:11", 
 "4/27/2019 04:30", "4/27/2019 05:00", "4/27/2019 05:03", 
 "4/27/2019 05:13", "4/27/2019 10:29", "4/27/2019 12:42", 
 "4/27/2019 17:21", "4/28/2019 03:47", "4/29/2019 09:56"), class = 
  "factor"), 
`Time in zone` = structure(c(5L, 31L, 6L, 7L, 2L, 3L, 23L, 
 30L, 26L, 4L, 32L, 27L, 9L, 8L, 22L, 28L, 22L, 22L, 1L, 24L, 
 15L, 1L, 29L, 18L, 1L, 8L, 17L, 22L, 19L, 16L, 14L, 25L, 
 13L, 31L, 16L, 1L, 12L, 10L, 21L, 11L, 20L), .Label = c("", 
 "10h 35m", "10h 49m", "13h 9m", "13m", "14h 37m", "14h 8m", 
 "15m", "19m", "1d 2h 14m", "1d 4h 21m", "1d 56m", "1h 13m", 
 "1h 15m", "1h 41m", "1m", "24m", "2m", "34m", "3d 1h 49m", 
 "3d 9h 33m", "3m", "42m", "4m", "54m", "5h 23m", "5m", "6m", 
 "7m", "8h 35m", "8m", "9h 19m"), class = "factor"), DOS = 
  structure(c(1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "0-12", class = 
 "factor")), row.names = c(NA, 
 -41L), class = "data.frame")

因此,例如在我的虚拟数据集中:

  • 作为船体" A"是DOS 0-12的"钓鱼"船,它发生在4月27日,我想将数据输入减少到一个记录 - 如果可能的话,总的"区域中的时间"one_answers"最后一个看到的内部"将被转移到突变的数据中,这将是很棒的 - 但是如果太复杂,不要太担心。因此,运送A只会显示:

     Name      Type   First seen inside    Last seen inside  Time in zone    DOS
        A   Fishing     4/27/2019 12:16     4/27/2019 12:42           21m   0-12
    

    ,但我对将其简化为一行的人很满意,如果这太多,则不必更正区域中的最后一个时间和时间。

  • 对于C船,由于它是一艘货船,我不想以与钓鱼相同的方式对待它,即使每天有多个文档,我也想保留所有已记录的数据

  • 对于船舶E,我希望有三个数据输入...

我希望这有意义吗?我不确定这是否是基于同一天乘法的DPLYR或mutate上的filter选项?关于如何管理此"问题"的任何建议都很棒...或者我需要在数据集上进行一些手动工作:(

df %>% group_by(Name,DOS,as.Date(`First seen inside`)) %>% 
  filter(Type=="Fishing") %>% 
  summarize(last=max(as.Date(`Last seen inside`, format="%m/%d/%Y")))

这样的东西?结果:

# A tibble: 10 x 4
# Groups:   Name, DOS [6]
   Name  DOS   `as.Date(`First seen inside`)` last      
   <fct> <fct> <date>                           <date>    
 1 A     0-12  2019-04-27                       2019-04-27
 2 B     0-12  2019-04-23                       2019-04-23
 3 B     0-12  2019-04-26                       2019-04-26
 4 B     0-12  2019-04-29                       2019-04-29
 5 D     0-12  2019-04-26                       2019-04-27
 6 E     0-12  2019-04-25                       2019-04-25
 7 E     0-12  2019-04-27                       2019-04-27
 8 G     0-12  2019-04-24                       2019-04-24
 9 G     0-12  2019-04-25                       2019-04-25
10 I     0-12  2019-04-26                       2019-04-29

最新更新