我有一个数据集,其中包含传感器超时数据。有一列详细说明了传感器ID日期列,其中包含传感器记录动物的日期,一列包含当天捕获的动物以及捕获的动物总数。我想知道传感器在一段时间内活动的天数。传感器不是每天都记录信息,因此我想添加缺失的几行数据。在物种列中有一个值"0";第一个";其描绘了传感器部署的第一天;最后一个";最后一天传感器正常工作。我只想填写第一天和最后一天之间缺失的天数。
ID Date Species Total
A1 2021-01-01 First
A1 2021-01-02 Mouse 1
A1 2021-01-03 Mouse 2
A1 2021-01-05 Mouse 1
A1 2021-01-06 Last
B1 2021-01-05 First
B1 2021-01-07 Mouse 2
B1 2021-01-08 Mouse 1
B1 2021-01-10 Mouse 2
B1 2021-01-12 Last
随着时间的推移,我有许多传感器和许多传感器部署。我想要的是
ID Date Species Total
A1 2021-01-01 First
A1 2021-01-02 Mouse 1
A1 2021-01-03 Mouse 2
A1 2021-01-04 Empty NA
A1 2021-01-05 Mouse 1
A1 2021-01-06 Last
B1 2021-01-05 First
B1 2021-01-06 Empty NA
B1 2021-01-07 Mouse 2
B1 2021-01-08 Mouse 1
B1 2021-01-09 Empty NA
B1 2021-01-10 Mouse 2
B1 2021-01-11 Empty NA
B1 2021-01-12 Last
我不知道从哪里开始,我想用dplyr的完整功能来填补空白,但我不确定如何才能让它只填补第一次和最后一次约会之间的日子。
任何帮助都非常感谢,因为我对R世界还很陌生!
感谢
我确信有一种方法可以做到这一点,以保持性能,但通常在处理复杂的工作时,我使用data.table
。
library(tribble)
df <- tribble(
~ID, ~Date, ~Species, ~Total,
'A1', '2021-01-01', 'First','',
'A1', '2021-01-02', 'Mouse','1',
'A1', '2021-01-03', 'Mouse','2',
'A1', '2021-01-05', 'Mouse','1',
'A1', '2021-01-06', 'Last','',
'B1', '2021-01-05', 'First','',
'B1', '2021-01-07', 'Mouse','2',
'B1', '2021-01-08', 'Mouse','1',
'B1', '2021-01-10', 'Mouse', '2',
'B1', '2021-01-12', 'Last', ''
) %>% mutate(Date = as.Date(Date))
library(data.table)
dt <- data.table(df)
dt[, {
first = as.Date(.SD[Species == 'First', Date]) # Get first date
last = as.Date(.SD[Species == 'Last', Date]) # Get second date
all_days = seq(first, last, 1) # get all days in between
# construct template result
imputed = data.table(ID = .SD$ID[1], Date = all_days, Species = 'Empty', Total = NA_character_)
# Left Join .SD onto imptuted to update known values
# .SD is readonly so cant do antoher way around
imputed[.SD, on = 'Date', `:=`(Species = i.Species, Total = i.Total)]
imputed # Return result
}, by = ID] # Group by ID
## Result
ID Date Species Total
1: A1 2021-01-01 First
2: A1 2021-01-02 Mouse 1
3: A1 2021-01-03 Mouse 2
4: A1 2021-01-04 Empty <NA>
5: A1 2021-01-05 Mouse 1
6: A1 2021-01-06 Last
7: B1 2021-01-05 First
8: B1 2021-01-06 Empty <NA>
9: B1 2021-01-07 Mouse 2
10: B1 2021-01-08 Mouse 1
11: B1 2021-01-09 Empty <NA>
12: B1 2021-01-10 Mouse 2
13: B1 2021-01-11 Empty <NA>
14: B1 2021-01-12 Last
使用tidyverse:
library(tidyverse)
df$Date <- as.Date(df$Date)
df %>%
group_by(ID )%>%
summarise(Date = seq(min(Date), max(Date), by = '1 day'), .groups = 'drop') %>%
left_join(df) %>%
replace_na(list(Species = 'Empty'))
# A tibble: 14 x 4
ID Date Species Total
<chr> <date> <chr> <int>
1 A1 2021-01-01 First NA
2 A1 2021-01-02 Mouse 1
3 A1 2021-01-03 Mouse 2
4 A1 2021-01-04 Empty NA
5 A1 2021-01-05 Mouse 1
6 A1 2021-01-06 Last NA
7 B1 2021-01-05 First NA
8 B1 2021-01-06 Empty NA
9 B1 2021-01-07 Mouse 2
10 B1 2021-01-08 Mouse 1
11 B1 2021-01-09 Empty NA
12 B1 2021-01-10 Mouse 2
13 B1 2021-01-11 Empty NA
14 B1 2021-01-12 Last NA
使用tidyr::complete
-
library(dplyr)
library(tidyr)
df %>%
group_by(ID) %>%
complete(Date = seq(min(Date), max(Date), by = 'day'),
fill = list(Species = 'Empty')) %>%
ungroup
# ID Date Species Total
# <chr> <date> <chr> <chr>
# 1 A1 2021-01-01 First ""
# 2 A1 2021-01-02 Mouse "1"
# 3 A1 2021-01-03 Mouse "2"
# 4 A1 2021-01-04 Empty NA
# 5 A1 2021-01-05 Mouse "1"
# 6 A1 2021-01-06 Last ""
# 7 B1 2021-01-05 First ""
# 8 B1 2021-01-06 Empty NA
# 9 B1 2021-01-07 Mouse "2"
#10 B1 2021-01-08 Mouse "1"
#11 B1 2021-01-09 Empty NA
#12 B1 2021-01-10 Mouse "2"
#13 B1 2021-01-11 Empty NA
#14 B1 2021-01-12 Last ""