从在同一时间戳发生的一组变量中从R数据表中选择数据



我有一个R数据表,如下所示:

time                var val
01-07-2019 13:27:31 X   5
01-07-2019 13:27:31 Y   10
01-07-2019 13:27:31 Z   12
01-07-2019 13:28:20 X   13
01-07-2019 13:28:20 Y   12
01-07-2019 13:28:20 Z   15
01-07-2019 13:49:54 X   3
01-07-2019 13:49:54 Y   3
01-07-2019 13:49:54 Z   4
01-07-2019 13:50:25 X   13
01-07-2019 13:50:25 Y   11
01-07-2019 13:50:25 Z   10

一组变量在同一时间内重复。所以每个时间戳都会有一套。如何在相同的timestamp中提取与X>10 & Y<15等条件匹配的行?

预期输出:

time                var val
01-07-2019 13:28:20 X   13
01-07-2019 13:28:20 Y   12
01-07-2019 13:28:20 Z   15
01-07-2019 13:50:25 X   13
01-07-2019 13:50:25 Y   11
01-07-2019 13:50:25 Z   10

以下是使用base R:时的两种方法

解决方案1:您可以首先重塑df,然后使用subset来满足标准

u <- subset(reshape(df,idvar = "time",timevar = "var", direction = "wide"),val.X>10, val.Y < 15)
> u
time val.X val.Y val.Z
4  01-07-2019 13:28:20    13    12    15
10 01-07-2019 13:50:25    13    11    10

然后

> df[!is.na(match(df$time,u$time)),]
time var val
4  01-07-2019 13:28:20   X  13
5  01-07-2019 13:28:20   Y  12
6  01-07-2019 13:28:20   Z  15
10 01-07-2019 13:50:25   X  13
11 01-07-2019 13:50:25   Y  11
12 01-07-2019 13:50:25   Z  10

解决方案2:您还可以使用split通过time分割数据帧

z <- split(df,df$time)
Reduce(rbind,z[sapply(z,function(v) with(v,ifelse(val[var=="X"]>10 & val[var=="Y"]<15,T,F)))])

然后

> Reduce(rbind,z[sapply(z,function(v) with(v,ifelse(val[var=="X"]>10 & val[var=="Y"]<15,T,F)))])
time var val
4  01-07-2019 13:28:20   X  13
5  01-07-2019 13:28:20   Y  12
6  01-07-2019 13:28:20   Z  15
10 01-07-2019 13:50:25   X  13
11 01-07-2019 13:50:25   Y  11
12 01-07-2019 13:50:25   Z  10

数据

df <- structure(list(time = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 
3L, 3L, 4L, 4L, 4L), .Label = c("01-07-2019 13:27:31", "01-07-2019 13:28:20", 
"01-07-2019 13:49:54", "01-07-2019 13:50:25"), class = "factor"), 
var = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), val = c(5L, 
10L, 12L, 13L, 12L, 15L, 3L, 3L, 4L, 13L, 11L, 10L)), class = "data.frame", row.names = c(NA, 
-12L))

这里有一个选项:

# Define helper function foo:
foo <- function(x, y) any(x == "X" & y > 10) & any(x == "Y" & y < 15)
DT[, .SD[foo(var, val)], by = time]

Tidy Answer

library(dplyr)
library(lubridate)
dataExample <- data.frame(
time = c(
'01-07-2019 13:27:31',
'01-07-2019 13:27:31',
'01-07-2019 13:27:32',
'01-07-2019 13:27:32',
'01-07-2019 13:27:30'),
var = c("X","Y","X","Y","X"),
val = c(4,5,3,5,9))
#Convert from character to date-time
dataExample$time <- dmy_hms(dataExample$time) 
dataExample %>% group_by(time) %>% 
filter((var == "X" & val > 4) | (var == "Y" & val < 8))

返回

# A tibble: 3 x 3
# Groups:   time [3]
time                var     val
<dttm>              <fct> <dbl>
1 2019-07-01 13:27:31 Y         5
2 2019-07-01 13:27:32 Y         5
3 2019-07-01 13:27:30 X         9

有几种方法可以做到这一点:

  1. 创建一个数据集,过滤掉X<10和Y>=15。然后,您只需要检查数据集中每个时间戳是否还有X和Y的值。您可以使用dataset[,anyX:=any(var=="X"(,by=time]检查每次在数据集中保留的var值
  2. 将数据集重塑为宽格式,然后同时检查这两个条件是很简单的。您还需要检查NA。这个方法假设不能在一个时间戳中为var的每个值都有多个值
  3. 筛选数据集以选择X和Y与您的条件匹配的时间戳。然后删除与条件不匹配的X和Y值(如果不能在一个时间戳中有多个X和Y,则可能不需要这样做(
dt[,outputTime:=any(var=="X"&val>10) | any(var=="Y"&val<15),by=time][outputTime==TRUE][,outputTime:=NULL][var=="X"&val>10|var=="Y"&val<10|var=="Z"]

最新更新