我正在寻找一种方法,从现有变量名中提取信息,创建新的变量来存储提取的信息,并根据新定义的变量将数据帧转换为长格式。处理心理治疗调查回复。
使用R或SQL。
现有变量名称中包含的信息:
事件=个人每次参加节目都是一个事件
主题=填写调查的个人(可以是参与者、母亲、父亲等(
类型=当前调查的名称(注意:有些调查有用"_&"分隔的附加识别信息(
instance=入院或出院后的天数
描述=问题编号或该列唯一的其他信息
目前,每一条信息由"0"分隔_&">
以下是格式:episode_subject_type_instance_description
## Have data currently in this format, but with almost 5000 variables
tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))
# A tibble: 3 x 3
case_name episode1_student_survey1_day0_Q1 episode1_student_survey1_day0_Q2
<chr> <dbl> <chr>
1 Joe 1 A
2 Mary 2 B
3 Jane 3 C
## Want to transform to long like this:
tibble(case_name = c("Joe", "Joe", "Mary", "Mary", "Jane", "Jane"),
episode = "episode1",
subject = "student",
type = "survey1",
instance = "day0",
description = c("Q1", "Q1", "Q1", "Q2", "Q2", "Q2"),
value = c(1, 2, 3, "A", "B", "C"))
# A tibble: 6 x 7
case_name episode subject type instance description value
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Joe episode1 student survey1 day0 Q1 1
2 Joe episode1 student survey1 day0 Q1 2
3 Mary episode1 student survey1 day0 Q1 3
4 Mary episode1 student survey1 day0 Q2 A
5 Jane episode1 student survey1 day0 Q2 B
6 Jane episode1 student survey1 day0 Q2 C
我假设有某种方法可以一次提取每一条信息,但不确定如何进行。
感谢您的帮助!!
在R
中,将列类型转换为除"case_name"之外的character
,然后使用pivot_longer
和separate
将name
列重塑为"long"格式
library(dplyr)
library(tidyr)
nm1 <- c("episode", "subject", "type", "instance", "description")
df1 %>%
mutate(across(-case_name, as.character)) %>%
pivot_longer(cols = -case_name) %>%
separate(name, into = nm1)
-输出
# A tibble: 6 x 7
case_name episode subject type instance description value
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Joe episode1 student survey1 day0 Q1 1
2 Joe episode1 student survey1 day0 Q2 A
3 Mary episode1 student survey1 day0 Q1 2
4 Mary episode1 student survey1 day0 Q2 B
5 Jane episode1 student survey1 day0 Q1 3
6 Jane episode1 student survey1 day0 Q2 C
数据
df1 <- tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))
这里有一个解决方案,它使用melt()
从宽格式重塑为长格式,并使用新的measure()
函数拆分列名:
library(data.table) # development version 1.14.1 used
melt(setDT(df1), measure.vars = measure(episode, subject, type, instance, description,
sep = "_"))
case_name episode subject type instance description value 1: Joe episode1 student survey1 day0 Q1 1 2: Mary episode1 student survey1 day0 Q1 2 3: Jane episode1 student survey1 day0 Q1 3 4: Joe episode1 student survey1 day0 Q2 A 5: Mary episode1 student survey1 day0 Q2 B 6: Jane episode1 student survey1 day0 Q2 C