我试图通过考虑不同的值将两列合并为一列,并将它们放入另一行,如果有的话。这是我的数据集的样子。
df <- data.frame(
id = c(1,2,3),
role = c("A","B","C"),
grade.1 = c(3,4,5),
state.1 = c(1,NA,1),
grade.2 = c(4,4,5),
state.2 = c(1,1,NA),
grade.3 = c(3,4,5),
state.3 = c(1,1,NA))
> df
id role grade.1 state.1 grade.2 state.2 grade.3 state.3
1 1 A 3 1 4 1 3 1
2 2 B 4 NA 4 1 4 1
3 3 C 5 1 5 NA 5 NA
我需要把这些grade.1
,grade.2
和grade.3
列合并成一个Grade
列。我尝试了coalesce
,但它失去了id
=1
信息,因为它在grade.
列中有两个等级。此外,state.
映射也不能很好地工作。
df <- df %>%
mutate(Grade = coalesce(grade.1, grade.2, grade.3))
> df
id role grade.1 state.1 grade.2 state.2 grade.3 state.3 Grade
1 1 A 3 1 4 1 3 1 3
2 2 B 4 NA 4 1 4 1 4
3 3 C 5 1 5 NA 5 NA 5
我想要的是为id
=1
添加另一行,并通过添加二级来放置第二行。我期望的数据集是:
> df.2
id role Grade state.1 state.2 state.3
1 1 A 3 1 NA 1
2 1 A 4 NA 1 NA
3 2 B 4 NA 1 1
4 3 C 5 1 NA NA
因此,当一个id
有多个等级时,需要将其放在不同的行中,并且state.
映射应该基于该等级。什么好主意吗?谢谢!
有一种方法-
- 获取长格式数据
- 对于每个
id
,将NA
添加到state
中除当前列号外的每个值。 - 对于
grade
列中的每一个唯一值,获取非na值。
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('grade'),
values_to = 'grade', names_to = NULL) %>%
group_by(id) %>%
mutate(across(starts_with('state'),
~replace(., -as.numeric(sub('state.', '', cur_column(), fixed = TRUE)), NA))) %>%
group_by(id, role, grade) %>%
summarise(across(starts_with('state'), ~.x[!is.na(.x)][1]), .groups = 'drop')
# id role grade state.1 state.2 state.3
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 A 3 1 NA 1
#2 1 A 4 NA 1 NA
#3 2 B 4 NA 1 1
#4 3 C 5 1 NA NA
我不确定这是否是您正在寻找的。或者这是最好的方法。但这是我到目前为止的内容:
# read them in as 3 tables
df1 <- df[,.(id, role, grade = grade.1, state.1)]
df2 <- df[,.(id, role, grade = grade.2, state.2)]
df3 <- df[,.(id, role, grade = grade.3, state.3)]
# set the keys to do joins
setkey(df1, id, role, grade)
setkey(df2, id, role, grade)
setkey(df3, id, role, grade)
df_res <- rbind(
df1[df2[df3]],
df1[df3[df2]],
df2[df3[df1]],
df2[df1[df3]],
df3[df1[df2]],
df3[df2[df1]],
fill = T
)
unique(df_res)[order(id)]
> id role grade state.1 state.2 state.3
1: 1 A 3 1 NA 1
2: 1 A 4 NA 1 NA
3: 2 B 4 NA 1 1
4: 3 C 5 1 NA NA