我必须创建一个数据帧,其中包含特定日期周期之间传感器列表中的数据:
DATE SENSOR1 SENSOR2 SENSOR3 SENSOR4
2020-04-20 00:00:00 1015 19.88 95.80 9.020
2020-04-20 00:10:00 1015 19.84 96.10 8.970
2020-04-20 00:20:00 1015 19.84 96.40 9.010
2020-04-20 00:30:00 1015 19.81 96.60 9.210
2020-04-20 00:40:00 1015 19.79 96.80 9.700
2020-04-20 00:50:00 1015 19.81 97.00 8.870
最初,我创建一个包含 1 列的数据帧(DATE:包含指定日期之间的所有日期的行,间隔为 10 分钟(。通常它可以有数千行,但为了重现一个示例,我们可以保持简单:
periods <- data.frame(DATE = c("2020-04-20 00:00:00","2020-04-20 00:10:00","2020-04-20 00:20:00","2020-04-20 00:30:00","2020-04-20 00:40:00","2020-04-20 00:50:00"))
我有一个传感器 -> ID 的列表,所以在循环中,我迭代所有传感器,查询我的数据库,从每个传感器返回 DATE 和 VALUE。问题是,传感器可以有 2 个或更多 id,具体取决于数据存储的日期。
ID SENSORNAME
1 SENSOR1 <- row that has data from SENSOR1 between 2020-04-20 00:00:00 and 2020-04-20 00:20:00
2 SENSOR2 ...
3 SENSOR3 ...
4 SENSOR4 ...
5 SENSOR1 <- row that has data from SENSOR1 between 2020-04-20 00:30:00 and 2020-04-20 00:50:00
6 SENSOR2 ...
7 SENSOR3 ...
8 SENSOR4 ...
原始代码:
for (i in 1:length(sensors$ID)) {
sensor <- dbGetQuery(con, paste0("SELECT DATE, VALUE FROM MEASURES WHERE DATE between '2020-04-20 00:00:00' and '2020-04-20 00:50:00' AND ID= ",sensors$ID[i]," ORDER BY DATE ASC"))
# getting rid of milliseconds
sensor$DATE <- as.character(round_date(sensor$DATE, "minute"))
# Renaming the column with sensor's name
names(sensor) <- c("DATE", sensors$SENSORNAME[i])
periods <- merge(periods,sensor,by="DATE",all = TRUE)
rm(sensor)
}
由于您无法在我的数据库中查询数据,因此可以通过手动创建 2 个 data.frame 来重现此示例
periods <- data.frame(DATE= c("2020-04-20 00:00:00","2020-04-20 00:10:00","2020-04-20 00:20:00","2020-04-20 00:30:00","2020-04-20 00:40:00","2020-04-20 00:50:00"), SENSOR1= c(1015, 1015, 1015, NA, NA, NA), SENSOR2= c(19.88, 19.84, 19.84, NA, NA, NA), SENSOR3= c(95.80, 96.10, 96.40, NA, NA, NA), SENSOR4= c(9.020, 8.970, 9.010, NA, NA, NA))
sensor <- data.frame(DATE= c("2020-04-20 00:00:00","2020-04-20 00:10:00","2020-04-20 00:20:00","2020-04-20 00:30:00","2020-04-20 00:40:00","2020-04-20 00:50:00"), SENSOR1= c(NA, NA, NA, 1010, 1010, 1010))
第 4 次迭代后,它开始在列名上添加后缀,如下所示:
DATE SENSOR1.x SENSOR2.x SENSOR3.x SENSOR4.x SENSOR1.y SENSOR2.y SENSOR3.y SENSOR4.y
2020-04-20 00:00:00 1015 19.88 95.80 9.020 NA NA NA NA
2020-04-20 00:10:00 1015 19.84 96.10 8.970 NA NA NA NA
2020-04-20 00:20:00 1015 19.84 96.40 9.010 NA NA NA NA
2020-04-20 00:30:00 NA NA NA NA 1015 19.81 96.60 9.210
2020-04-20 00:40:00 NA NA NA NA 1015 19.79 96.80 9.700
2020-04-20 00:50:00 NA NA NA NA 1015 19.81 97.00 8.870
关于如何正确合并它或在生成数据帧后修复它的任何想法?
您可以使用tidyr
中的pivot_longer
将所有内容放在一列中,并在使用pivot_wider
将所有内容放回宽格式之前rbind
所有内容。您还需要使用na.omit()
删除 NA。
library(tidyr)
periods %>%
pivot_longer(-DATE) %>%
rbind(sensor %>%
pivot_longer(-DATE) ) %>%
na.omit() %>%
pivot_wider(names_from = name, values_from = value)
Joining, by = c("DATE", "name", "value")
# A tibble: 6 x 5
DATE SENSOR1 SENSOR2 SENSOR3 SENSOR4
<fct> <dbl> <dbl> <dbl> <dbl>
1 2020-04-20 00:00:00 1015 19.9 95.8 9.02
2 2020-04-20 00:10:00 1015 19.8 96.1 8.97
3 2020-04-20 00:20:00 1015 19.8 96.4 9.01
4 2020-04-20 00:30:00 1010 NA NA NA
5 2020-04-20 00:40:00 1010 NA NA NA
6 2020-04-20 00:50:00 1010 NA NA NA
数据
periods <- data.frame(DATE= c("2020-04-20 00:00:00","2020-04-20 00:10:00","2020-04-20 00:20:00","2020-04-20 00:30:00","2020-04-20 00:40:00","2020-04-20 00:50:00"), SENSOR1= c(1015, 1015, 1015, NA, NA, NA), SENSOR2= c(19.88, 19.84, 19.84, NA, NA, NA), SENSOR3= c(95.80, 96.10, 96.40, NA, NA, NA), SENSOR4= c(9.020, 8.970, 9.010, NA, NA, NA))
sensor <- data.frame(DATE= c("2020-04-20 00:00:00","2020-04-20 00:10:00","2020-04-20 00:20:00","2020-04-20 00:30:00","2020-04-20 00:40:00","2020-04-20 00:50:00"), SENSOR1= c(NA, NA, NA, 1010, 1010, 1010))