使用 R 将数据帧转换为基于存储在当前变量名称(用 "_" 分隔)中的唯一信息片段



我正在寻找一种方法,从现有变量名中提取信息,创建新的变量来存储提取的信息,并根据新定义的变量将数据帧转换为长格式。处理心理治疗调查回复。

使用R或SQL。

现有变量名称中包含的信息:

事件=个人每次参加节目都是一个事件
主题=填写调查的个人(可以是参与者、母亲、父亲等(
类型=当前调查的名称(注意:有些调查有用"_&"分隔的附加识别信息(
instance=入院或出院后的天数
描述=问题编号或该列唯一的其他信息

目前,每一条信息由"0"分隔_&">
以下是格式:episode_subject_type_instance_description

## Have data currently in this format, but with almost 5000 variables  
tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))
# A tibble: 3 x 3
case_name episode1_student_survey1_day0_Q1 episode1_student_survey1_day0_Q2
<chr>                                <dbl> <chr>                           
1 Joe                                      1 A                               
2 Mary                                     2 B                               
3 Jane                                     3 C                               
## Want to transform to long like this:  
tibble(case_name = c("Joe", "Joe", "Mary", "Mary", "Jane", "Jane"),
episode = "episode1",
subject = "student",
type = "survey1",
instance = "day0",
description = c("Q1", "Q1", "Q1", "Q2", "Q2", "Q2"),
value = c(1, 2, 3, "A", "B", "C"))
# A tibble: 6 x 7
case_name episode  subject type    instance description value
<chr>     <chr>    <chr>   <chr>   <chr>    <chr>       <chr>
1 Joe       episode1 student survey1 day0     Q1          1    
2 Joe       episode1 student survey1 day0     Q1          2    
3 Mary      episode1 student survey1 day0     Q1          3    
4 Mary      episode1 student survey1 day0     Q2          A    
5 Jane      episode1 student survey1 day0     Q2          B    
6 Jane      episode1 student survey1 day0     Q2          C  

我假设有某种方法可以一次提取每一条信息,但不确定如何进行。

感谢您的帮助!!

R中,将列类型转换为除"case_name"之外的character,然后使用pivot_longerseparatename列重塑为"long"格式

library(dplyr)
library(tidyr)
nm1 <- c("episode", "subject", "type", "instance", "description")
df1 %>%
mutate(across(-case_name, as.character)) %>%
pivot_longer(cols = -case_name) %>% 
separate(name, into = nm1)

-输出

# A tibble: 6 x 7
case_name episode  subject type    instance description value
<chr>     <chr>    <chr>   <chr>   <chr>    <chr>       <chr>
1 Joe       episode1 student survey1 day0     Q1          1    
2 Joe       episode1 student survey1 day0     Q2          A    
3 Mary      episode1 student survey1 day0     Q1          2    
4 Mary      episode1 student survey1 day0     Q2          B    
5 Jane      episode1 student survey1 day0     Q1          3    
6 Jane      episode1 student survey1 day0     Q2          C    

数据

df1 <- tibble(case_name = c("Joe", "Mary", "Jane"),
episode1_student_survey1_day0_Q1 = c(1, 2, 3),
episode1_student_survey1_day0_Q2 = c("A", "B", "C"))

这里有一个解决方案,它使用melt()从宽格式重塑为长格式,并使用新的measure()函数拆分列名:

library(data.table) # development version 1.14.1 used
melt(setDT(df1), measure.vars = measure(episode, subject, type, instance, description,
sep = "_"))
case_name  episode subject    type instance description value
1:       Joe episode1 student survey1     day0          Q1     1
2:      Mary episode1 student survey1     day0          Q1     2
3:      Jane episode1 student survey1     day0          Q1     3
4:       Joe episode1 student survey1     day0          Q2     A
5:      Mary episode1 student survey1     day0          Q2     B
6:      Jane episode1 student survey1     day0          Q2     C

最新更新