r-根据范围填充面板数据集中缺少的行



我想用基于现有分组变量范围的行填充Panel数据集。

为了更好地理解,我将使用一个示例数据集:

> df<-data.frame(Student=c(1, 1, 2), Year=c(1,2,2), Type=c("Test","Exam","Test"),Points=c(80,140,30))
> df
Student Year Type Points
1       1    1 Test     80
2       1    2 Exam    140
3       2    2 Test     30

我想为每个学生每年提供两次基于分数范围的观察。转换后应该是这样的:

> df2<-data.frame(Student=c(1, 1, 1,1,2,2,2,2), Year=c(1,1,2,2,1,1,2,2), PointRange=c("0_100","100_200","0_100","100_200","0_100","100_200","0_100","100_200"), n_tests=c(1,0,0,0,0,0,1,0), n_exams=c(0,0,0,1,0,0,0,0))
> df2
Student Year PointRange n_tests n_exams
1       1    1      0_100       1       0
2       1    1    100_200       0       0
3       1    2      0_100       0       0
4       1    2    100_200       0       1
5       2    1      0_100       0       0
6       2    1    100_200       0       0
7       2    2      0_100       1       0
8       2    2    100_200       0       0

我已经用dplyr包尝试了以下内容:

df %>% mutate(PointRange = case_when(Points >= 0 & Points <= 100 ~ 1, Points >= 101 & Points <= 200 ~ 2)) %>%
+         group_by(Student, Year, PointRange) %>%
+   summarise(n_tests = sum(Type == "Test"),
+             n_exams = sum(Type=="Exam"))
# A tibble: 3 x 5
# Groups:   Student, Year [?]
Student  Year PointRange n_tests n_exams
<dbl> <dbl>      <dbl>   <int>   <int>
1       1     1          1       1       0
2       1     2          2       0       1
3       2     2          1       1       0

缺少的是五行,所以对于每个用户,我每年都有两个点范围。我该怎么解决?

您可以使用cut创建范围,然后使用dplyr::complete()创建学生、年份和范围的所有组合-

result <- df %>%
mutate(PointRange = cut(Points, breaks = c(0, 100, 200), right=  F)) %>%
complete(Student, Year, PointRange) %>%
group_by(Student, Year, PointRange) %>%
summarize(
n_tests = sum(Type == "Test", na.rm = T),
n_exams = sum(Type == "Exam", na.rm = T)
)
# A tibble: 8 x 5
# Groups:   Student, Year [?]
Student  Year PointRange n_tests n_exams
<dbl> <dbl> <fct>        <int>   <int>
1    1.00  1.00 [0,100)          1       0
2    1.00  1.00 [100,200)        0       0
3    1.00  2.00 [0,100)          0       0
4    1.00  2.00 [100,200)        0       1
5    2.00  1.00 [0,100)          0       0
6    2.00  1.00 [100,200)        0       0
7    2.00  2.00 [0,100)          1       0
8    2.00  2.00 [100,200)        0       0

相关内容

最新更新