我有这个示例数据集
df <- data.frame(subjects = 1:12,
Why_are_you_not_happy =
c(1,2,"1,2,5",5,1,2,"3,4",3,2,"1,5",3,4),
why_are_you_sad =
c("1,2,3",1,2,3,"4,5,3",2,1,4,3,1,1,1) )
并希望将其转换为伪变量格式(基于每个问题的5个答案(。有人能指导我走一条有效的路吗?谢谢
您可以separate_rows
进行多项选择,通过subjects
转换为dummy和summarise
(为每个受试者获得一行,以及他们的所有选择(。
library(fastDummies)
library(tidyr)
library(dplyr)
df %>%
separate_rows(Why_are_you_not_happy, why_are_you_sad) %>%
dummy_cols(c("Why_are_you_not_happy", "why_are_you_sad"),
remove_selected_columns = TRUE) %>%
group_by(subjects) %>%
summarise(across(everything(), max))
输出
# A tibble: 12 × 11
subjects Why_are_you…¹ Why_a…² Why_a…³ Why_a…⁴ Why_a…⁵ why_a…⁶ why_a…⁷ why_a…⁸ why_a…⁹ why_a…˟
<int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 0 0 0 0 1 1 1 0 0
2 2 0 1 0 0 0 1 0 0 0 0
3 3 1 1 0 0 1 0 1 0 0 0
4 4 0 0 0 0 1 0 0 1 0 0
5 5 1 0 0 0 0 0 0 1 1 1
6 6 0 1 0 0 0 0 1 0 0 0
7 7 0 0 1 1 0 1 0 0 0 0
8 8 0 0 1 0 0 0 0 0 1 0
9 9 0 1 0 0 0 0 0 1 0 0
10 10 1 0 0 0 1 1 0 0 0 0
11 11 0 0 1 0 0 1 0 0 0 0
12 12 0 0 0 1 0 1 0 0 0 0