我正在尝试创建一个数据集,显示学生如何在课程中移动。我有一个这样的数据集:
InvoiceDate StudentName Course
<date> <fct> <fct>
1 2020-07-26 Tom Level 1
2 2020-11-05 Tom Level 2
3 2021-11-05 Tom Level 3
4 2018-10-15 Mary Level 1
5 2020-08-06 Mary Level 2
6 2021-10-10 Mary Level 2
我想知道的是,当一个学生达到一定水平后,该学生会学习哪门课程,以及当该学生没有学习任何后续课程时。我想创建的数据集是:
FullName StartCourseDate StartCourse FollowUpCourse FollowUpCourseDate
1 Tom 2020-07-26 Level 1 Level 2 2020-11-05
2 Tom 2020-11-05 Level 2 Level 3 2021-11-05
2 Tom 2021-11-05 Level 3 Stop Stop
3 Mary 2018-10-15 Level 1 Level 2 2020-08-06
4 Mary 2020-08-06 Level 2 Level 2 2021-10-10
4 Mary 2021-10-10 Level 2 Stop Stop
我尝试过不同的tidyverse
/dplyr
,但无法按正确的顺序排列行。希望有人能帮忙:(
按日期排列的第一个arrange
(确保它是类别date
(。然后在列中添加前导数据,然后添加rename
ing和relocate
ing。
library(dplyr)
df %>%
group_by(StudentName) %>%
arrange(InvoiceDate, .by_group = TRUE) %>%
mutate(FollowUpCourse = lead(Course, default = "Stop"),
FollowUpCourseDate = lead(as.character(InvoiceDate), default = "Stop") ) %>%
rename(StartCourseDate = InvoiceDate, FullName = StudentName, StartCourse = Course) %>%
relocate(FullName, .before = StartCourseDate) %>%
ungroup()
# A tibble: 6 × 5
FullName StartCourseDate StartCourse FollowUpCourse FollowUpCourseDate
<chr> <date> <chr> <chr> <chr>
1 Mary 2018-10-15 Level 1 Level 2 2020-08-06
2 Mary 2020-08-06 Level 2 Level 2 2021-10-10
3 Mary 2021-10-10 Level 2 Stop Stop
4 Tom 2020-07-26 Level 1 Level 2 2020-11-05
5 Tom 2020-11-05 Level 2 Level 3 2021-11-05
6 Tom 2021-11-05 Level 3 Stop Stop
数据
df <- structure(list(InvoiceDate = structure(c(18469, 18571, 18936,
17819, 18480, 18910), class = "Date"), StudentName = c("Tom",
"Tom", "Tom", "Mary", "Mary", "Mary"), Course = c("Level 1",
"Level 2", "Level 3", "Level 1", "Level 2", "Level 2")), row.names = c(NA,
-6L), class = "data.frame")
您可以通过mutating
和使用lead
获得结果,使用across
可以压缩您的mutate调用。
library(dplyr)
df |>
mutate(Name = StudentName, StartCourseDate = InvoiceDate,
StartCourse = Course, across(c( Course, InvoiceDate), ~
ifelse(Course != "Level 3", lead(as.character(.x), default = "Stop"), "Stop"),
.names = "FollowUp{.col}"), .keep = "unused")
Name StartCourseDate StartCourse FollowUpCourse FollowUpInvoiceDate
1 Tom 2020-07-26 Level 1 Level 2 2020-11-05
2 Tom 2020-11-05 Level 2 Level 3 2021-11-05
3 Tom 2021-11-05 Level 3 Stop Stop
4 Mary 2018-10-15 Level 1 Level 2 2020-08-06
5 Mary 2020-08-06 Level 2 Level 2 2021-10-10
6 Mary 2021-10-10 Level 2 Stop Stop