r语言 - 使用第一个值不跟随其他列中的其他值创建新列



我有以下示例数据集显示熊是在陆地上还是在海冰上:

Bear.ID Region
1        A   Land
2        A   Land
3        A   Land
4        A    Ice
5        A    Ice
6        A    Ice
7        A    Ice
8        B   Land
9        B    Ice
10       B   Land
11       B   Land
12       B    Ice
13       B    Ice
14       B    Ice
15       B    Ice

我的目标是创建另一个列,显示每只熊离开到海冰上。这个出发日期被定义为熊在冰面上的第一行,而不是熊在陆地上的那一行。

在我的示例数据集中,列是这样的:

Bear.ID Region Departure?
1        A   Land     Not Departure
2        A   Land     Not Departure
3        A   Land     Not Departure
4        A    Ice     Departure
5        A    Ice     Not Departure
6        A    Ice     Not Departure
7        A    Ice     Not Departure
8        B   Land     Not Departure
9        B    Ice     Not Departure
10       B   Land     Not Departure
11       B   Land     Not Departure
12       B    Ice     Departure
13       B    Ice     Not Departure
14       B    Ice     Not Departure
15       B    Ice     Not Departure

我如何在R中做到这一点?"Departure"one_answers"不离开";如果方便的话,也可以用TRUE和FALSE替换。

So I think this might help you, I am checking if my current land is 'ice', the one before is 'land' and the one after is also 'ice'
library(dplyr)
data.frame(
bear = c("A","A","A","B","B","B","B","B"),
region = c("Land","Ice","Ice","Land","Ice","Land","Ice","Ice")
) %>% 
group_by(bear) %>% 
mutate(
departure = case_when(
region == 'Ice' & lag(region) == "Land" & lead(region) == "Ice" ~ "Departure",
TRUE ~ "Not departure"
)
)
# A tibble: 8 x 3
# Groups:   bear [2]
bear  region departure    
<chr> <chr>  <chr>        
1 A     Land   Not departure
2 A     Ice    Departure    
3 A     Ice    Not departure
4 B     Land   Not departure
5 B     Ice    Not departure
6 B     Land   Not departure
7 B     Ice    Departure    
8 B     Ice    Not departure

我已经生成了一个示例数据框架。

library(data.table)
library(tidyverse)
id <-c(1, 1, 1, 2, 2)
date <- c("2022-10-01 22:22:01","2022-11-05 22:22:01","2022-08-18 12:48:16","2022-11-19 20:57:44","2022-12-19 20:57:44")
date_1 <- c("2022-11-01 22:22:01","2022-11-02 22:22:01","2022-11-03 12:48:16","2022-11-04 20:57:44","2022-11-05 20:57:44")
date_2 <- c("2022-12-01 22:22:01","2022-12-02 22:22:01","2022-12-03 12:48:16","2022-12-04 20:57:44","2022-12-05 20:57:44")
df <- data.table(id,date,date_1, date_2)

在这一步中,我格式化所有的日期列。

#### format date
df$date <- as.POSIXct(df$date)
df$date_1 <- as.POSIXct(df$date_1)
df$date_2 <- as.POSIXct(df$date_2)

检查日期列是否在时间范围内

#### check if date is in time range
df$Departure_Date <- ifelse(df$date <= df$date_2 & df$date >= df$date_1, "Departure", "Not Departure")

结果如下:

id                date              date_1              date_2 Departure.Date
1:  1 2022-10-01 22:22:01 2022-11-01 22:22:01 2022-12-01 22:22:01  Not Departure
2:  1 2022-11-05 22:22:01 2022-11-02 22:22:01 2022-12-02 22:22:01      Departure
3:  1 2022-08-18 12:48:16 2022-11-03 12:48:16 2022-12-03 12:48:16  Not Departure
4:  2 2022-11-19 20:57:44 2022-11-04 20:57:44 2022-12-04 20:57:44      Departure
5:  2 2022-12-19 20:57:44 2022-11-05 20:57:44 2022-12-05 20:57:44  Not Departure

最新更新