r语言 - 如何为包含管道的代码创建循环



我对R代码相当陌生,我试图避免复制粘贴同一行20次,因为我目前试图手动做到这一点:我有一个数据框架与3个变量:日期。时间,深度,ms(样本):

date.time               Depth      ms
1: 2015-12-20 00:48:50 113.5  0.316666667
2: 2015-12-20 01:25:50 156.0 -0.966666667
3: 2015-12-20 01:26:50 170.5 -0.241666667
4: 2015-12-20 01:27:50 215.5 -0.750000000
5: 2015-12-20 01:28:50 276.5 -1.016666667
6: 2015-12-20 01:29:50 373.0 -1.608333333
7: 2015-12-20 01:30:50 453.0 -1.333333333
8: 2015-12-20 01:31:50 500.0 -0.783333333
9: 2015-12-20 01:35:50 512.0  0.241666667
10: 2015-12-20 03:53:50 285.0  0.058333333
11: 2015-12-20 03:54:50 355.0 -1.166666667
12: 2015-12-20 03:55:50 453.5 -1.641666667
12: 2015-12-20 03:57:50 526.0  0.000000000
14: 2015-12-21 15:01:50 449.5  0.016666667
15: 2015-12-21 15:02:50 467.5 -0.300000000
16: 2015-12-21 16:07:50 308.5  0.100000000
17: 2015-12-21 16:08:50 392.0 -1.391666667
18: 2015-12-21 16:09:50 491.0 -1.650000000
19: 2015-12-21 16:11:50 581.0  0.000000000
20: 2015-12-22 22:02:50 461.0  0.075000000
21: 2015-12-22 22:03:50 463.0 -0.033333333
22: 2015-12-22 22:04:50 466.0 -0.050000000
23: 2015-12-22 22:05:50 467.5 -0.025000000
24: 2015-12-22 22:06:50 468.0 -0.008333333
25: 2015-12-22 22:07:50 471.0 -0.050000000
26: 2015-12-22 22:08:50 472.5 -0.025000000
27: 2015-12-22 22:09:50 530.0 -0.958333333

我已经手动通过选择潜水开始和结束的行来分隔每次潜水(例如:

d1<- df[c(1:9),]
d2<- df[c(10:13),]
d3<- df[c(14:20),]
d4<- df[c(21:27),]

,然后将以下代码应用于我创建的每个新df (d1, d2, d3, d4)(下面是d1的示例):

d1<- newdf[c(1:19),]
d1$date.time <- as_datetime(d1$date.time)
str(d1)
d1 %>% 
group_by(Ptt) %>%
mutate(
diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
as.numeric() %>% #changes to numeric
cumsum() #gets cumulative sum
) -> d1
d1$Divenumber <- as.character('1')

这给了我期望的输出:

d1
date.time           Depth     ms diffMin Divenumber
<dttm>              <dbl>  <dbl>   <dbl> <chr>     
1 2015-12-20 00:48:50  114.  0.317       0 1         
2 2015-12-20 01:25:50  156  -0.967      37 1         
3 2015-12-20 01:26:50  170. -0.242      38 1         
4 2015-12-20 01:27:50  216. -0.75       39 1         
5 2015-12-20 01:28:50  276. -1.02       40 1         
6 2015-12-20 01:29:50  373  -1.61       41 1         
7 2015-12-20 01:30:50  453  -1.33       42 1         
8 2015-12-20 01:31:50  500  -0.783      43 1         
9 2015-12-20 01:35:50  512   0.242      47 1         

d2

date.time           Depth      ms diffMin Divenumber
<dttm>              <dbl>   <dbl>   <dbl> <chr>     
1 2015-12-20 03:53:50  285   0.0583       0 2         
2 2015-12-20 03:54:50  355  -1.17         1 2         
3 2015-12-20 03:55:50  454. -1.64         2 2         
4 2015-12-20 03:57:50  526   0            4 2  

为每个新的df,但正如你所看到的,这是相当多的复制粘贴,以获得每个新的df,然后在最后绑定它们。我相信有更快的方法可以做到这一点,但经过几个小时的尝试后,我不能完全做到这一点。有人能帮我做这个吗(也许在某种类型的循环中),这将允许我循环整个数据集,并为每个新的潜水分配一个新的潜水号码,以及从潜水开始到潜水结束的时差,以分钟为单位?此外,如果将来不必手动分离潜水,并且只能考虑使用case_whenlagdate.time创建某种类型的代码来区分潜水,那就太好了。但我很高兴任何其他可能的建议!

下面是我的数据子集的输出:

structure(list(date.time = structure(c(1450572530, 1450574750, 
1450574810, 1450574870, 1450574930, 1450574990, 1450575050, 1450575110, 
1450575350, 1450583630, 1450583690, 1450583750, 1450583870, 
1450710110, 1450710170, 1450714070, 1450714130, 1450714190, 1450714310, 
1450821770, 1450821830, 1450821890, 1450821950, 1450822010, 1450822070, 
1450822130, 1450822190), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
Depth = c(113.5, 156, 170.5, 215.5, 276.5, 373, 453, 500, 
512, 285, 355, 453.5, 526, 449.5, 467.5, 308.5, 392, 
491, 581, 461, 463, 466, 467.5, 468, 471, 472.5, 530), ms = c(0.316666666666667, 
-0.966666666666667, -0.241666666666667, -0.75, -1.01666666666667, 
-1.60833333333333, -1.33333333333333, -0.783333333333333, 
0.241666666666667, 0.0583333333333333, 
-1.16666666666667, -1.64166666666667, 0, 0.0166666666666667, 
-0.3, 0.1, -1.39166666666667, -1.65, 0, 0.075, -0.0333333333333333, 
-0.05, -0.025, -0.00833333333333333, -0.05, -0.025, -0.958333333333333
)), row.names = c(NA, -28L), class = c("data.table", "data.frame"
)

Thanks in advance

另一种方法。我已经使用了一个简单的while循环来完成您的要求。并使用了你在评论中所说的潜水逻辑。如果你有任何疑问,请告诉我。

#Load the data in df
#Create a list for the dive. Set the first element as 1, as it will be dive 1
dive <- c(1)
#Create a counter
dive_count <- 1
#Start the while loop from i =2, as the first one is automatically considered in dive 1
i <-2
while (i <= nrow(df)) {
if (df$Depth[i]> df$Depth[i-1]){
dive[i] <- dive_count
}
else{
dive_count <- dive_count+1
dive[i] <- dive_count
}
i<- i+1
}
df$dive <- dive

检查最终数据帧

df
date.time Depth           ms dive
1  2015-12-20 00:48:50 113.5  0.316666667    1
2  2015-12-20 01:25:50 156.0 -0.966666667    1
3  2015-12-20 01:26:50 170.5 -0.241666667    1
4  2015-12-20 01:27:50 215.5 -0.750000000    1
5  2015-12-20 01:28:50 276.5 -1.016666667    1
6  2015-12-20 01:29:50 373.0 -1.608333333    1
7  2015-12-20 01:30:50 453.0 -1.333333333    1
8  2015-12-20 01:31:50 500.0 -0.783333333    1
9  2015-12-20 01:35:50 512.0  0.241666667    1
10 2015-12-20 03:53:50 285.0  0.058333333    2
11 2015-12-20 03:54:50 355.0 -1.166666667    2
12 2015-12-20 03:55:50 453.5 -1.641666667    2
13 2015-12-20 03:57:50 526.0  0.000000000    2
14 2015-12-21 15:01:50 449.5  0.016666667    3
15 2015-12-21 15:02:50 467.5 -0.300000000    3
16 2015-12-21 16:07:50 308.5  0.100000000    4
17 2015-12-21 16:08:50 392.0 -1.391666667    4
18 2015-12-21 16:09:50 491.0 -1.650000000    4
19 2015-12-21 16:11:50 581.0  0.000000000    4
20 2015-12-22 22:02:50 461.0  0.075000000    5
21 2015-12-22 22:03:50 463.0 -0.033333333    5
22 2015-12-22 22:04:50 466.0 -0.050000000    5
23 2015-12-22 22:05:50 467.5 -0.025000000    5
24 2015-12-22 22:06:50 468.0 -0.008333333    5
25 2015-12-22 22:07:50 471.0 -0.050000000    5
26 2015-12-22 22:08:50 472.5 -0.025000000    5
27 2015-12-22 22:09:50 530.0 -0.958333333    5

保持阈值为2小时,您可以通过使用cumsum作为-

自动创建dive
library(dplyr)
n_seconds <- 7200 #2hours
df <- df %>% 
mutate(dive = cumsum(difftime(date.time, 
lag(date.time, default = first(date.time) - n_seconds - 1), 
units = 'secs') > n_seconds))
df
#             date.time Depth           ms dive
#1  2015-12-20 00:48:50 113.5  0.316666667    1
#2  2015-12-20 01:25:50 156.0 -0.966666667    1
#3  2015-12-20 01:26:50 170.5 -0.241666667    1
#4  2015-12-20 01:27:50 215.5 -0.750000000    1
#5  2015-12-20 01:28:50 276.5 -1.016666667    1
#6  2015-12-20 01:29:50 373.0 -1.608333333    1
#7  2015-12-20 01:30:50 453.0 -1.333333333    1
#8  2015-12-20 01:31:50 500.0 -0.783333333    1
#9  2015-12-20 01:35:50 512.0  0.241666667    1
#10 2015-12-20 03:53:50 285.0  0.058333333    2
#11 2015-12-20 03:54:50 355.0 -1.166666667    2
#12 2015-12-20 03:55:50 453.5 -1.641666667    2
#13 2015-12-20 03:57:50 526.0  0.000000000    2
#14 2015-12-21 15:01:50 449.5  0.016666667    3
#15 2015-12-21 15:02:50 467.5 -0.300000000    3
#16 2015-12-21 16:07:50 308.5  0.100000000    3
#17 2015-12-21 16:08:50 392.0 -1.391666667    3
#18 2015-12-21 16:09:50 491.0 -1.650000000    3
#19 2015-12-21 16:11:50 581.0  0.000000000    3
#20 2015-12-22 22:02:50 461.0  0.075000000    4
#21 2015-12-22 22:03:50 463.0 -0.033333333    4
#22 2015-12-22 22:04:50 466.0 -0.050000000    4
#23 2015-12-22 22:05:50 467.5 -0.025000000    4
#24 2015-12-22 22:06:50 468.0 -0.008333333    4
#25 2015-12-22 22:07:50 471.0 -0.050000000    4
#26 2015-12-22 22:08:50 472.5 -0.025000000    4
#27 2015-12-22 22:09:50 530.0 -0.958333333    4

您可以根据您的数据更改阈值,我根据提供的样本选择了2小时。

执行Ronak发布的上述代码,然后使用管道按潜水分组并计算累计潜水时间:

df <- df %>% 
group_by(dive) %>%
mutate(
diffMin = difftime(date.time, lag(date.time,1, default = date.time[1] ), unit = "mins") %>% #calculate time diff of each row
as.numeric() %>% #changes to numeric
cumsum()) #gets cumulative sum

最新更新