R-根据(跟进)日期按名称进行总结



我正在尝试创建一个数据集,显示学生如何在课程中移动。我有一个这样的数据集:

InvoiceDate StudentName        Course                    
<date>      <fct>             <fct>                                     
1 2020-07-26  Tom                Level 1
2 2020-11-05  Tom                Level 2    
3 2021-11-05  Tom                Level 3    
4 2018-10-15  Mary               Level 1                                      
5 2020-08-06  Mary               Level 2                                    
6 2021-10-10  Mary               Level 2

我想知道的是,当一个学生达到一定水平后,该学生会学习哪门课程,以及当该学生没有学习任何后续课程时。我想创建的数据集是:

FullName  StartCourseDate   StartCourse    FollowUpCourse   FollowUpCourseDate        
1 Tom       2020-07-26        Level 1        Level 2          2020-11-05                   
2 Tom       2020-11-05        Level 2        Level 3          2021-11-05
2 Tom       2021-11-05        Level 3        Stop             Stop
3 Mary      2018-10-15        Level 1        Level 2          2020-08-06
4 Mary      2020-08-06        Level 2        Level 2          2021-10-10
4 Mary      2021-10-10        Level 2        Stop             Stop

我尝试过不同的tidyverse/dplyr,但无法按正确的顺序排列行。希望有人能帮忙:(

按日期排列的第一个arrange(确保它是类别date(。然后在列中添加前导数据,然后添加renameing和relocateing。

library(dplyr)
df %>% 
group_by(StudentName) %>% 
arrange(InvoiceDate, .by_group = TRUE) %>% 
mutate(FollowUpCourse = lead(Course, default = "Stop"), 
FollowUpCourseDate = lead(as.character(InvoiceDate), default = "Stop") ) %>% 
rename(StartCourseDate = InvoiceDate, FullName = StudentName, StartCourse = Course) %>% 
relocate(FullName, .before = StartCourseDate) %>%
ungroup()
# A tibble: 6 × 5
FullName StartCourseDate StartCourse FollowUpCourse FollowUpCourseDate
<chr>    <date>          <chr>       <chr>          <chr>             
1 Mary     2018-10-15      Level 1     Level 2        2020-08-06        
2 Mary     2020-08-06      Level 2     Level 2        2021-10-10        
3 Mary     2021-10-10      Level 2     Stop           Stop              
4 Tom      2020-07-26      Level 1     Level 2        2020-11-05        
5 Tom      2020-11-05      Level 2     Level 3        2021-11-05        
6 Tom      2021-11-05      Level 3     Stop           Stop

数据

df <- structure(list(InvoiceDate = structure(c(18469, 18571, 18936, 
17819, 18480, 18910), class = "Date"), StudentName = c("Tom", 
"Tom", "Tom", "Mary", "Mary", "Mary"), Course = c("Level 1", 
"Level 2", "Level 3", "Level 1", "Level 2", "Level 2")), row.names = c(NA, 
-6L), class = "data.frame")

您可以通过mutating和使用lead获得结果,使用across可以压缩您的mutate调用。

library(dplyr)
df |> 
mutate(Name = StudentName, StartCourseDate = InvoiceDate,
StartCourse = Course, across(c( Course, InvoiceDate), ~ 
ifelse(Course != "Level 3", lead(as.character(.x), default = "Stop"), "Stop"), 
.names = "FollowUp{.col}"),  .keep = "unused")
Name StartCourseDate StartCourse FollowUpCourse FollowUpInvoiceDate
1  Tom      2020-07-26     Level 1        Level 2          2020-11-05
2  Tom      2020-11-05     Level 2        Level 3          2021-11-05
3  Tom      2021-11-05     Level 3           Stop                Stop
4 Mary      2018-10-15     Level 1        Level 2          2020-08-06
5 Mary      2020-08-06     Level 2        Level 2          2021-10-10
6 Mary      2021-10-10     Level 2           Stop                Stop

相关内容

  • 没有找到相关文章

最新更新