添加最多接下来 10 行的列

  • 本文关键字:接下来 添加 r
  • 更新时间 :
  • 英文 :


我正在尝试向数据帧添加一列,其中包含另一列(高(接下来十行的最大值。在下面的示例中,第一行的最大值为 92.83。我是使用 R 的新手,并且在这样做时遇到了一些问题。

Date_Time           High  Max_Next10
2014-06-30 08:35:00 92.55 92.83
2014-06-30 08:40:00 92.69 92.83
2014-06-30 08:45:00 92.63 92.83
2014-06-30 08:50:00 92.83 92.80
2014-06-30 08:55:00 92.80 92.76
2014-06-30 09:00:00 92.71 92.76
2014-06-30 09:05:00 92.76 92.72
2014-06-30 09:10:00 92.72 92.75
2014-06-30 09:15:00 92.70 92.75
2014-06-30 09:20:00 92.70 92.75
2014-06-30 09:25:00 92.70 92.75
2014-06-30 09:30:00 92.63 92.76
2014-06-30 09:35:00 92.63 92.76
2014-06-30 09:40:00 92.57 N/A
2014-06-30 09:45:00 92.59 N/A
2014-06-30 09:50:00 92.58 N/A
2014-06-30 09:55:00 92.72 N/A
2014-06-30 10:00:00 92.75 N/A
2014-06-30 10:05:00 92.69 N/A
2014-06-30 10:10:00 92.66 N/A
2014-06-30 10:15:00 92.75 N/A
2014-06-30 10:20:00 92.76 N/A
2014-06-30 10:25:00 92.72 N/A

有一个名为 zoo 的包和一个名为 rollmax 的函数

一条简单的线得到你的结果。

df$Max_Next10=zoo::rollmax(df$High, 10, na.pad = TRUE,align='left')
> df
         Date_Time  High Max_Next10
1   6/30/2014 8:35 92.55      92.83
2   6/30/2014 8:40 92.69      92.83
3   6/30/2014 8:45 92.63      92.83
4   6/30/2014 8:50 92.83      92.83
5   6/30/2014 8:55 92.80      92.80
6   6/30/2014 9:00 92.71      92.76
7   6/30/2014 9:05 92.76      92.76
8   6/30/2014 9:10 92.72      92.72
9   6/30/2014 9:15 92.70      92.75
10  6/30/2014 9:20 92.70      92.75
11  6/30/2014 9:25 92.70      92.75
12  6/30/2014 9:30 92.63      92.75
13  6/30/2014 9:35 92.63      92.76
14  6/30/2014 9:40 92.57      92.76
15  6/30/2014 9:45 92.59         NA
16  6/30/2014 9:50 92.58         NA
17  6/30/2014 9:55 92.72         NA
18 6/30/2014 10:00 92.75         NA
19 6/30/2014 10:05 92.69         NA
20 6/30/2014 10:10 92.66         NA
21 6/30/2014 10:15 92.75         NA
22 6/30/2014 10:20 92.76         NA
23 6/30/2014 10:25 92.72         NA

具有sapply的解决方案:

df$Max_Next10 <- sapply(seq_len(nrow(df)), function(i){
    if(i + 10 > nrow(df))
        NA
    else
        max(df$High[(i + 1):(i + 10)])
})

我开始的数据:

# > dput(df)
structure(list(Date_Time = c("2014-06-30 08:35:00", "2014-06-30 08:40:00", 
"2014-06-30 08:45:00", "2014-06-30 08:50:00", "2014-06-30 08:55:00", 
"2014-06-30 09:00:00", "2014-06-30 09:05:00", "2014-06-30 09:10:00", 
"2014-06-30 09:15:00", "2014-06-30 09:20:00", "2014-06-30 09:25:00", 
"2014-06-30 09:30:00", "2014-06-30 09:35:00", "2014-06-30 09:40:00", 
"2014-06-30 09:45:00", "2014-06-30 09:50:00", "2014-06-30 09:55:00", 
"2014-06-30 10:00:00", "2014-06-30 10:05:00", "2014-06-30 10:10:00", 
"2014-06-30 10:15:00", "2014-06-30 10:20:00", "2014-06-30 10:25:00"
), High = c(92.55, 92.69, 92.63, 92.83, 92.8, 92.71, 92.76, 92.72, 
92.7, 92.7, 92.7, 92.63, 92.63, 92.57, 92.59, 92.58, 92.72, 92.75, 
92.69, 92.66, 92.75, 92.76, 92.72)), .Names = c("Date_Time", 
"High"), row.names = c(NA, -23L), class = "data.frame")

您可以创建一个函数,该函数将数据框和列名称作为参数,并为每一行计算引用列接下来 10 行的最大值:

mk.next10 <- function (data, col) {
  count <- 10
  c(
    sapply(1:(nrow(data) - count), function(i) max(data[(i+1):(i+1+count),col], na.rm=T)),
    rep(NA, count)
  )
}

这样,您可以为数据框创建列:

data$Max_Next10 <- mk.next10(data, 'High') 

在下面的代码中,我们正在处理的数据帧名为 test 。根据您的情况进行相应的更改。

# Initialise
rm(list = ls())
library(data.table)
library(plyr)
# Load/Create data
test <- data.frame(value=c(300,100,200,50,100,80,100,700,500,300,250,510,100,620,910))
# Add index
test$id <- seq.int(nrow(test))
# Count number of rows
n <- nrow(test)
# Loop to create variable with Max
for(i in 1:n) {
  test_i <- subset(test,id>=i & id < i+10)
  max_test_i <- max(test_i$value)
  setDT(test)[i, Max:= max_test_i]
}

输出为:

value   id  Max
300 1   700
100 2   700
200 3   700
50  4   700
100 5   700
80  6   910
100 7   910
700 8   910
500 9   910
300 10  910
250 11  910
510 12  910
100 13  910
620 14  910
910 15  910

最新更新