R:沿列的条件累积和/滚动

  • 本文关键字:滚动 条件 r
  • 更新时间 :
  • 英文 :


我有一个数据集,我试图探索在给定值上限制变量并将超出部分滚动到后续间隔的影响。我可以在概念上看到cumsum()或类似的一些方法来做到这一点,但很难看到如何以逻辑的方式实现它。

输入数据不是很大(10,000行,而不是100,000行);所以效率并不重要。

表示输入数据:

tbody><<晚上九点点02:30 8.6

这只是一个基本的循环,可以做您想做的事情。它不是特别有效,但我想不出一个好方法来使用矢量化使它更快。

overflow <- 0
for (i in 1:nrow(d)) {
if (d$kWh[i] + overflow > 20) {
d$limit_kWh[i] <- 20
overflow <- d$kWh[i] + overflow - 20
}
else {
d$limit_kWh[i] <- d$kWh[i] + overflow
overflow <- 0
}
}

基本上大于20的值(如果有的话)存储在overflow变量中,该变量在每个条目更新一次。


实际上,这里有一个大约快2倍的方法,它更多地依赖于矢量化。它涉及到创建一个overflow向量,其中包含从前一个日期开始的溢出量。

overflow <- numeric(nrow(d))
for (i in 2:nrow(d)) {
overflow[i] <- max(d$kWh[i-1] + overflow[i-1] - 20, 0)
}
d$limit_kWh <- pmin(d$kWh + overflow, 20)

一种方法将Reduceaccumulate一起使用。方法与@Noah给出的答案相同。

x$limit_kWh <- pmin(20, x$kWh + head(Reduce(function(x, y)
{max(0, x + y - 20)}, x$kWh, 0, accumulate = TRUE), -1))
x
#   interval starting  kWh limit_kWh
#1   2021-01-01 19:00 12.2      12.2
#2   2021-01-01 19:30 14.7      14.7
#3   2021-01-01 20:00 20.2      20.0
#4   2021-01-01 20:30 30.7      20.0
#5   2021-01-01 21:00 36.3      20.0
#6   2021-01-01 21:30 36.7      20.0
#7   2021-01-01 22:00 30.1      20.0
#8   2021-01-01 22:30 26.3      20.0
#9   2021-01-01 23:00 18.1      20.0
#10  2021-01-01 23:30 15.8      20.0
#11  2021-01-02 00:00 11.4      20.0
#12  2021-01-02 00:30 10.2      20.0
#13  2021-01-02 01:00 11.9      20.0
#14  2021-01-02 01:30 12.3      20.0
#15  2021-01-02 02:00  9.1      20.0
#16  2021-01-02 02:30  8.6      17.7
#17  2021-01-02 03:00  8.3       8.3
#18  2021-01-02 03:30 10.1      10.1

数据:

x <- read.table(header = TRUE, check.names =  FALSE,
text = '"interval starting"     kWh
"2021-01-01 19:00"  12.2
"2021-01-01 19:30"  14.7
"2021-01-01 20:00"  20.2
"2021-01-01 20:30"  30.7
"2021-01-01 21:00"  36.3
"2021-01-01 21:30"  36.7
"2021-01-01 22:00"  30.1
"2021-01-01 22:30"  26.3
"2021-01-01 23:00"  18.1
"2021-01-01 23:30"  15.8
"2021-01-02 00:00"  11.4
"2021-01-02 00:30"  10.2
"2021-01-02 01:00"  11.9
"2021-01-02 01:30"  12.3
"2021-01-02 02:00"  9.1
"2021-01-02 02:30"  8.6
"2021-01-02 03:00"  8.3
"2021-01-02 03:30"  10.1')

我采用@Noah的基本逻辑并将其放入数据步()中。它是相同的结果,并且并不比for循环更有效。但它更容易阅读。

输入数据如下:

# Input data
dt <- read.table(header = TRUE, text = '
interval_starting   kWh
"2021-01-01 19:00"  12.2
"2021-01-01 19:30"  14.7
"2021-01-01 20:00"  20.2
"2021-01-01 20:30"  30.7
"2021-01-01 21:00"  36.3
"2021-01-01 21:30"  36.7
"2021-01-01 22:00"  30.1
"2021-01-01 22:30"  26.3
"2021-01-01 23:00"  18.1
"2021-01-01 23:30"  15.8
"2021-01-02 00:00"  11.4
"2021-01-02 00:30"  10.2
"2021-01-02 01:00"  11.9
"2021-01-02 01:30"  12.3
"2021-01-02 02:00"  9.1
"2021-01-02 02:30"  8.6
"2021-01-02 03:00"  8.3
"2021-01-02 03:30"  10.1')

数据步骤如下:

library(libr)
# Run datastep
res <- datastep(dt, 
retain = list(overflow = 0),
calculate = {limit = 20},
drop = c("limit", "overflow"),
{

if (kWh + overflow > limit) {
limit_kWh  <- limit
overflow <- kWh + overflow - limit

} else {

limit_kWh <- kWh + overflow
overflow <- 0
}

})

结果如下:

# View results
res
#    interval_starting  kWh limit_kWh
# 1   2021-01-01 19:00 12.2      12.2
# 2   2021-01-01 19:30 14.7      14.7
# 3   2021-01-01 20:00 20.2      20.0
# 4   2021-01-01 20:30 30.7      20.0
# 5   2021-01-01 21:00 36.3      20.0
# 6   2021-01-01 21:30 36.7      20.0
# 7   2021-01-01 22:00 30.1      20.0
# 8   2021-01-01 22:30 26.3      20.0
# 9   2021-01-01 23:00 18.1      20.0
# 10  2021-01-01 23:30 15.8      20.0
# 11  2021-01-02 00:00 11.4      20.0
# 12  2021-01-02 00:30 10.2      20.0
# 13  2021-01-02 01:00 11.9      20.0
# 14  2021-01-02 01:30 12.3      20.0
# 15  2021-01-02 02:00  9.1      20.0
# 16  2021-01-02 02:30  8.6      17.7
# 17  2021-01-02 03:00  8.3       8.3
# 18  2021-01-02 03:30 10.1      10.1

最新更新



  • All rights reserved © 2023 www.xiaobeizi.cn

  • 首页
间隔起始kWh
2021-01-01点12.2
2021-01-01 7:3014.7
2021-01-01 20:0020.2
2021-01-01 20:30/td>30.7
2021-01-0136.3
2021-01-01虽然36.7
2021-01-01 22:0030.1
2021-01-01 22:3026.3
2021-01-0118.1
2021-01-01 23:3015.8
2021-01-02 00:0011.4
2021-01-02 00:3010.2
2021-01-02 01:0011.9
2021-01-02 01:3012.3
2021-01-02 02:009.1
2021-01-02
2021-01-02 03:008.3
2021-01-02 03:3010.1