我有一个数据清理问题。数据收集发生了三次,有时数据输入不正确。因此,如果学生的数据被收集了不止一次,那么最后一个数据点需要被复制。
以下是我的数据集:
df <- data.frame(id = c(1,1,1, 2,2,2, 3,3,3, 4),
text = c("female","male","male", "female","female","female", "male","female","female", "male"),
time = c("first","second","third", "first","second","third", "first","second","third", "first"))
> df
id text time
1 1 female first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 male first
8 3 female second
9 3 female third
10 4 male first
因此,由于输入错误,第一和第三名学生的性别信息不同。需要最后一次(第三次(点数据复制到其余数据上。
所需输出为
> df1
id text time
1 1 male first
2 1 male second
3 1 male third
4 2 female first
5 2 female second
6 2 female third
7 3 female first
8 3 female second
9 3 female third
10 4 male first
有什么想法吗?谢谢
我们可以使用last
返回'text'的最后一个值,该值使recycled
更新mutate
中的列
library(dplyr)
df <- df %>%
group_by(id) %>%
mutate(text = last(text)) %>%
ungroup
如果我们想要第二个或第三个值,请使用nth
并修改n
,使min
inum值为2或组大小n()
(当每组元素少于2个时(
df %>%
group_by(id) %>%
mutate(text = nth(text, min(c(2, n()))))