我一直在收集12个犊牛的心率,他们每个人都通过四种不同的给药途径接受麻醉。我现在有48个格式的TXT文件:
时间hrbpm
0:00:01.7 97
0:00:02.3 121
0:00:02.8 15
... ...
HR记录了大约2小时。时间列取决于监视器,导致两种度量之间的时间间隔不一致。
TXT文件命名如下:6133_IM_27.00.txt6133是ID,IM IM路线和注入治疗的时间(最小值:最小值:S(。
我的第一个目标是拥有所有HR数据,以便我可以进行异常分析。
那么,我想将所有这些数据包括在看起来像这样的单个数据框架中:
data.frame(ID=c(6133,6133,6133,6133,"...",6134,6134,"..."),
Route = c("IM","IM","IM","IM","...","SC","SC","..."),
time=c(0, 10, 20, 30,"...",0,10,"..."),
HR=c(160, 150, 145, 130,"...",162,158,"..."))
时间列以10分钟的增量从0到120。该DF的每个HR将代表给定时间的前分钟的HR值的平均值(例如,对于给定ID/路线组合的时间= 30,HR表示平均值29至30分钟(。
我是R的新手,所以我一直很麻烦,只是从该问题开始的角度开始。欢迎任何帮助。
谢谢,
托马斯
对于那些在这篇文章中偶然发现的人来说,这就是我所做的,似乎在起作用。
library(plyr)
library(reshape)
library(ggplot2)
setwd("/directory")
filelist = list.files(pattern = ".*.txt")
datalist = lapply(filelist, read.delim)
for (i in 1:length(datalist))
{datalist[[i]][3] = filelist[i]}
df = do.call("rbind", datalist)
attach(df)
out_lowHR = quantile(HRbpm,0.25)-1.5*IQR(HRbpm)
out_highHR = quantile(HRbpm,0.75)+1.5*IQR(HRbpm) #outliers thresholds: 60 and 200
dfc = subset(df,HRbpm>=60 & HRbpm<=200)
(length(df$HRbpm)-length(dfc$HRbpm))/length(df$HRbpm)*100 #8.6% of values excluded
df = dfc
df$ID = substr(df$V3,4,7)
df$ROA = substr(df$V3,9,11)
df$ti = substr(df$V3,13,17)
df$Time = as.POSIXct(as.character(df$Time), format="%H:%M:%S")
df$ti = as.POSIXct(as.character(df$ti), format="%M.%S")
df$t = as.numeric(df$Time-df$ti)
m=60
meanHR = ddply(df, c("ROA","ID"), summarise,
mean0 = mean(HRbpm[t>-60*m & t <=0]),
mean10 = mean(HRbpm[t>9*m & t <=10*m]),
mean20 = mean(HRbpm[t>19*m & t <=20*m]),
mean30 = mean(HRbpm[t>29*m & t <=30*m]),
mean45 = mean(HRbpm[t>44*m & t <=45*m]),
mean60 = mean(HRbpm[t>59*m & t <=60*m]),
mean90 = mean(HRbpm[t>89*m & t <=90*m]),
mean120 = mean(HRbpm[t>119*m & t <=120*m]))
meanHR = melt(meanHR)
meanHR$time = as.numeric(gsub("mean", "", meanHR$variable))
ggplot(meanHR, aes(x = time, y = value, col = ROA))+
geom_smooth()+
theme_classic()