我正在尝试将形式为'initial.df'的数据文件转换为'final.df',我的编程和R技能正在接受认真的测试。我继续尝试各种方法,但都没有成功。
# minimal initial data structure
initial.df = cbind.data.frame(dtime = as.POSIXct(c("12:30", "12:31", "12:32",
"13:10","13:11","13:12","20:14","20:15", "20:160"), format="%H:%M"),
flow=c(120, 100, 90, 110, 100, 95, 115, 100, 95))
initial.df
# minimal final data structure
final.df = cbind.data.frame(initial.df, cycle=c(rep(1, 3), rep(2,3), rep(3,3)))
final.df
作为背景,数据文件是在过滤过程中每分钟从膜生物反应器记录的数据,并且存在分离每个循环的过滤间隙。每个循环运行数小时。提前感谢您的帮助。Vince谢谢Vince
更新数据集以更好地反映实际数据类型:
initial.df = cbind.data.frame(dtime = as.POSIXct(c("2015-12-18 23:58",
"2015-12-18 23:59", "2015-12-19 00:01", "2015-12-19 00:02", "2015-12-19 4:58",
"2015-12-19 04:59", "2015-12-19 05:00", "2015-12-19 05:01", "2015-12-19 5:02",
"2015-12-19 07:59", "2015-12-19 08:00", "2015-12-19 08:01", "2015-12-19 8:02"), format="%Y-%m-%d %H:%M"), flow=c(120, 100, 90, 80, 75, 110, 100, 95, 85, 115, 100, 95, 90))
initial.df
# final data structure
final.df = cbind.data.frame(initial.df, cycle=c(rep(1, 4), rep(2,5), rep(3,4)))
final.df
我们可以将cut
指定为"dtime",将breaks
指定为"1 hour"以创建分组变量,然后获取相邻元素之间的差(diff
(,检查哪个元素大于1,并在开始时附加TRUE
值后计算累积和(因为diff
输出长度比"dttime"列的长度小1(
initial.df$cycle <- cumsum(c(TRUE,diff(cut(initial.df$dtime,
breaks='1 hour', labels=FALSE))>1))