用R中的第一个和最后一个字符列完成日期



我有一个数据集,其中包含传感器超时数据。有一列详细说明了传感器ID日期列,其中包含传感器记录动物的日期,一列包含当天捕获的动物以及捕获的动物总数。我想知道传感器在一段时间内活动的天数。传感器不是每天都记录信息,因此我想添加缺失的几行数据。在物种列中有一个值"0";第一个";其描绘了传感器部署的第一天;最后一个";最后一天传感器正常工作。我只想填写第一天和最后一天之间缺失的天数。

ID  Date        Species  Total
A1  2021-01-01  First    
A1  2021-01-02  Mouse    1
A1  2021-01-03  Mouse    2
A1  2021-01-05  Mouse    1
A1  2021-01-06  Last    
B1  2021-01-05  First
B1  2021-01-07  Mouse    2
B1  2021-01-08  Mouse    1
B1  2021-01-10  Mouse    2
B1  2021-01-12  Last     

随着时间的推移,我有许多传感器和许多传感器部署。我想要的是

ID  Date        Species  Total
A1  2021-01-01  First    
A1  2021-01-02  Mouse    1
A1  2021-01-03  Mouse    2
A1  2021-01-04  Empty    NA
A1  2021-01-05  Mouse    1
A1  2021-01-06  Last    
B1  2021-01-05  First
B1  2021-01-06  Empty    NA
B1  2021-01-07  Mouse    2
B1  2021-01-08  Mouse    1
B1  2021-01-09  Empty    NA
B1  2021-01-10  Mouse    2
B1  2021-01-11  Empty    NA
B1  2021-01-12  Last     

我不知道从哪里开始,我想用dplyr的完整功能来填补空白,但我不确定如何才能让它只填补第一次和最后一次约会之间的日子。

任何帮助都非常感谢,因为我对R世界还很陌生!

感谢

我确信有一种方法可以做到这一点,以保持性能,但通常在处理复杂的工作时,我使用data.table

library(tribble)
df <- tribble(
~ID,  ~Date, ~Species, ~Total,
'A1',  '2021-01-01',  'First','',
'A1',  '2021-01-02',  'Mouse','1',
'A1',  '2021-01-03',  'Mouse','2',
'A1',  '2021-01-05',  'Mouse','1',
'A1',  '2021-01-06',  'Last','',
'B1',  '2021-01-05',  'First','',
'B1',  '2021-01-07',  'Mouse','2',
'B1',  '2021-01-08',  'Mouse','1',
'B1',  '2021-01-10',  'Mouse',    '2',
'B1',  '2021-01-12',  'Last',     ''
) %>% mutate(Date = as.Date(Date))
library(data.table)
dt <- data.table(df)
dt[, {
first = as.Date(.SD[Species == 'First', Date]) # Get first date
last = as.Date(.SD[Species == 'Last', Date]) # Get second date
all_days = seq(first, last, 1) # get all days in between
# construct template result
imputed = data.table(ID = .SD$ID[1], Date = all_days, Species = 'Empty', Total = NA_character_) 
# Left Join .SD onto imptuted to update known values
# .SD is readonly so cant do antoher way around
imputed[.SD, on = 'Date', `:=`(Species = i.Species, Total = i.Total)] 

imputed # Return result

}, by = ID] # Group by ID
## Result
ID       Date Species Total
1: A1 2021-01-01   First      
2: A1 2021-01-02   Mouse     1
3: A1 2021-01-03   Mouse     2
4: A1 2021-01-04   Empty  <NA>
5: A1 2021-01-05   Mouse     1
6: A1 2021-01-06    Last      
7: B1 2021-01-05   First      
8: B1 2021-01-06   Empty  <NA>
9: B1 2021-01-07   Mouse     2
10: B1 2021-01-08   Mouse     1
11: B1 2021-01-09   Empty  <NA>
12: B1 2021-01-10   Mouse     2
13: B1 2021-01-11   Empty  <NA>
14: B1 2021-01-12    Last      

使用tidyverse:

library(tidyverse)
df$Date <- as.Date(df$Date)
df %>%
group_by(ID )%>%
summarise(Date = seq(min(Date), max(Date), by = '1 day'), .groups = 'drop') %>%
left_join(df) %>%
replace_na(list(Species = 'Empty'))
# A tibble: 14 x 4
ID    Date       Species Total
<chr> <date>     <chr>   <int>
1 A1    2021-01-01 First      NA
2 A1    2021-01-02 Mouse       1
3 A1    2021-01-03 Mouse       2
4 A1    2021-01-04 Empty      NA
5 A1    2021-01-05 Mouse       1
6 A1    2021-01-06 Last       NA
7 B1    2021-01-05 First      NA
8 B1    2021-01-06 Empty      NA
9 B1    2021-01-07 Mouse       2
10 B1    2021-01-08 Mouse       1
11 B1    2021-01-09 Empty      NA
12 B1    2021-01-10 Mouse       2
13 B1    2021-01-11 Empty      NA
14 B1    2021-01-12 Last       NA

使用tidyr::complete-

library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
complete(Date = seq(min(Date), max(Date), by = 'day'), 
fill = list(Species = 'Empty')) %>%
ungroup
#   ID    Date       Species Total
#   <chr> <date>     <chr>   <chr>
# 1 A1    2021-01-01 First   ""   
# 2 A1    2021-01-02 Mouse   "1"  
# 3 A1    2021-01-03 Mouse   "2"  
# 4 A1    2021-01-04 Empty    NA  
# 5 A1    2021-01-05 Mouse   "1"  
# 6 A1    2021-01-06 Last    ""   
# 7 B1    2021-01-05 First   ""   
# 8 B1    2021-01-06 Empty    NA  
# 9 B1    2021-01-07 Mouse   "2"  
#10 B1    2021-01-08 Mouse   "1"  
#11 B1    2021-01-09 Empty    NA  
#12 B1    2021-01-10 Mouse   "2"  
#13 B1    2021-01-11 Empty    NA  
#14 B1    2021-01-12 Last    ""   

相关内容

  • 没有找到相关文章

最新更新