R dataframe:在满足条件的上面和下面填充特定数量的行



我在R中工作,我有一个DateTime列和一个标记时间事件的二进制列的数据框,如示例数据框:

DateTime <- seq(from = as.POSIXct("2021-01-01 00:00:00"), to = as.POSIXct("2021-01-01 17:00:00"), by = "hour")
Binary <- c(NA, 1, rep(NA, 5), 1, rep(NA, 5), 1, rep(NA, 4))
sample <- data.frame(DateTime, Binary)

我想创建一个新列,分配'H',其中'1'在二进制列中表示,以及x行上下,其中'1'表示。对于本例,在上面和下面各1行,如'goal'数据框所示:

Height <- c(rep('H', 3), rep(NA, 3), rep('H',3), rep(NA, 3), rep('H', 3), rep(NA, 3))
goal <- data.frame(DateTime, Binary, Height)

我可以使用for循环实现这一点。然而,它非常慢,因为我拥有的实际数据集非常大(几乎有100万个观测值)。下面是我使用的for循环的一个例子:

# create new column Height
sample$Height <- NA
# Use a for loop to assign H
for (i in 1 : length(sample$Height)){
if(sample$Binary[i] %in% c(1)){sample$Height [i] <- "H"}
if(sample$Binary[i] %in% c(1)){sample$Height [i+1] <- "H"}
if(sample$Binary[i] %in% c(1)){sample$Height [i-1] <- "H"}
} 

我可以使用dplyr在二进制列中有'1'的行中分配'H'。

sample <- sample %>%
mutate(Height = ifelse(sample$Binary==1,'H', NA))

然而,是否有一种方法来填补这上面和下面的指定行数?

我还考虑在上面的步骤之后使用fill():

sample <- fill(sample$Height, .direction="updown")

但是,当然,这填满了所有的NA,这是我不想要的…

DateTime <- seq(from = as.POSIXct("2021-01-01 00:00:00"), to = as.POSIXct("2021-01-01 17:00:00"), by = "hour")
Binary <- c(NA, 1, rep(NA, 5), 1, rep(NA, 5), 1, rep(NA, 4))
sample <- data.frame(DateTime, Binary)
Height <- c(rep('H', 3), rep(NA, 3), rep('H',3), rep(NA, 3), rep('H', 3), rep(NA, 3))
goal <- data.frame(DateTime, Binary, ExpectedHeight = Height)
library(dplyr)    
goal %>% 
mutate(
Height = case_when(
Binary | lag(Binary) | lead(Binary)  == 1 ~ "H",
TRUE ~ NA_character_
)
)
DateTime Binary ExpectedHeight Height
1  2021-01-01 00:00:00     NA              H      H
2  2021-01-01 01:00:00      1              H      H
3  2021-01-01 02:00:00     NA              H      H
4  2021-01-01 03:00:00     NA           <NA>   <NA>
5  2021-01-01 04:00:00     NA           <NA>   <NA>
6  2021-01-01 05:00:00     NA           <NA>   <NA>
7  2021-01-01 06:00:00     NA              H      H
8  2021-01-01 07:00:00      1              H      H
9  2021-01-01 08:00:00     NA              H      H
10 2021-01-01 09:00:00     NA           <NA>   <NA>
11 2021-01-01 10:00:00     NA           <NA>   <NA>
12 2021-01-01 11:00:00     NA           <NA>   <NA>
13 2021-01-01 12:00:00     NA              H      H
14 2021-01-01 13:00:00      1              H      H
15 2021-01-01 14:00:00     NA              H      H
16 2021-01-01 15:00:00     NA           <NA>   <NA>
17 2021-01-01 16:00:00     NA           <NA>   <NA>
18 2021-01-01 17:00:00     NA           <NA>   <NA>

最新更新