我有一个R数据表,如下所示:
time var val
01-07-2019 13:27:31 X 5
01-07-2019 13:27:31 Y 10
01-07-2019 13:27:31 Z 12
01-07-2019 13:28:20 X 13
01-07-2019 13:28:20 Y 12
01-07-2019 13:28:20 Z 15
01-07-2019 13:49:54 X 3
01-07-2019 13:49:54 Y 3
01-07-2019 13:49:54 Z 4
01-07-2019 13:50:25 X 13
01-07-2019 13:50:25 Y 11
01-07-2019 13:50:25 Z 10
一组变量在同一时间内重复。所以每个时间戳都会有一套。如何在相同的timestamp
中提取与X>10 & Y<15
等条件匹配的行?
预期输出:
time var val
01-07-2019 13:28:20 X 13
01-07-2019 13:28:20 Y 12
01-07-2019 13:28:20 Z 15
01-07-2019 13:50:25 X 13
01-07-2019 13:50:25 Y 11
01-07-2019 13:50:25 Z 10
以下是使用base R
:时的两种方法
解决方案1:您可以首先重塑df
,然后使用subset
来满足标准
u <- subset(reshape(df,idvar = "time",timevar = "var", direction = "wide"),val.X>10, val.Y < 15)
> u
time val.X val.Y val.Z
4 01-07-2019 13:28:20 13 12 15
10 01-07-2019 13:50:25 13 11 10
然后
> df[!is.na(match(df$time,u$time)),]
time var val
4 01-07-2019 13:28:20 X 13
5 01-07-2019 13:28:20 Y 12
6 01-07-2019 13:28:20 Z 15
10 01-07-2019 13:50:25 X 13
11 01-07-2019 13:50:25 Y 11
12 01-07-2019 13:50:25 Z 10
解决方案2:您还可以使用split
通过time
分割数据帧
z <- split(df,df$time)
Reduce(rbind,z[sapply(z,function(v) with(v,ifelse(val[var=="X"]>10 & val[var=="Y"]<15,T,F)))])
然后
> Reduce(rbind,z[sapply(z,function(v) with(v,ifelse(val[var=="X"]>10 & val[var=="Y"]<15,T,F)))])
time var val
4 01-07-2019 13:28:20 X 13
5 01-07-2019 13:28:20 Y 12
6 01-07-2019 13:28:20 Z 15
10 01-07-2019 13:50:25 X 13
11 01-07-2019 13:50:25 Y 11
12 01-07-2019 13:50:25 Z 10
数据
df <- structure(list(time = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L), .Label = c("01-07-2019 13:27:31", "01-07-2019 13:28:20",
"01-07-2019 13:49:54", "01-07-2019 13:50:25"), class = "factor"),
var = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), val = c(5L,
10L, 12L, 13L, 12L, 15L, 3L, 3L, 4L, 13L, 11L, 10L)), class = "data.frame", row.names = c(NA,
-12L))
这里有一个选项:
# Define helper function foo:
foo <- function(x, y) any(x == "X" & y > 10) & any(x == "Y" & y < 15)
DT[, .SD[foo(var, val)], by = time]
Tidy Answer
library(dplyr)
library(lubridate)
dataExample <- data.frame(
time = c(
'01-07-2019 13:27:31',
'01-07-2019 13:27:31',
'01-07-2019 13:27:32',
'01-07-2019 13:27:32',
'01-07-2019 13:27:30'),
var = c("X","Y","X","Y","X"),
val = c(4,5,3,5,9))
#Convert from character to date-time
dataExample$time <- dmy_hms(dataExample$time)
dataExample %>% group_by(time) %>%
filter((var == "X" & val > 4) | (var == "Y" & val < 8))
返回
# A tibble: 3 x 3
# Groups: time [3]
time var val
<dttm> <fct> <dbl>
1 2019-07-01 13:27:31 Y 5
2 2019-07-01 13:27:32 Y 5
3 2019-07-01 13:27:30 X 9
有几种方法可以做到这一点:
- 创建一个数据集,过滤掉X<10和Y>=15。然后,您只需要检查数据集中每个时间戳是否还有X和Y的值。您可以使用dataset[,anyX:=any(var=="X"(,by=time]检查每次在数据集中保留的var值
- 将数据集重塑为宽格式,然后同时检查这两个条件是很简单的。您还需要检查NA。这个方法假设不能在一个时间戳中为var的每个值都有多个值
- 筛选数据集以选择X和Y与您的条件匹配的时间戳。然后删除与条件不匹配的X和Y值(如果不能在一个时间戳中有多个X和Y,则可能不需要这样做(
dt[,outputTime:=any(var=="X"&val>10) | any(var=="Y"&val<15),by=time][outputTime==TRUE][,outputTime:=NULL][var=="X"&val>10|var=="Y"&val<10|var=="Z"]