我正在查看的数据结构是:
head(data)
ID Gender Location Generation Question Response
1 2 Male South America Generation X Q0. Vote in the upcoming election? 0: No
2 2 Male South America Generation X Q1. Pulse Rate 0: No
3 2 Male South America Generation X Q2. Metabolism 0: No
4 2 Male South America Generation X Q3.Blood Pressure 1: Yes
5 2 Male South America Generation X Q4. Temperature 0: No
6 2 Male South America Generation X Q5. Galvanic Skin Response 1: Yes
此数据框中的列标题如下所示:
> colnames(data)
[1] "ID" "Gender" "Location" "Generation" "Question" "Response"
标题的Question
包含提出的问题,Responses
也是如此。我想怎么看是:
> colnames(final_data)
[1] "ID" "Gender"
[3] "Location" "Generation"
[5] "Q0. Vote in the upcoming election?" "Q1. Pulse Rate"
[7] "Q134. Good Job Skills" "Q135. Sense of Humor"
[9] "Q136. Intelligence" "Q137.Can Play Jazz"
[11] "Q138.Likes the Beatles" "Q139. Snobbiness"
[13] "Q140.Ability to lift heavy objects" "Q141.Grace under pressure"
[15] "Q142.Grace on the dance floor" "Q143.Likes animals"
[17] "Q144.Makes good coffee" "Q145.Eats all his/her vegetables"
[19] "Q2. Metabolism" "Q3.Blood Pressure"
[21] "Q4. Temperature" "Q5. Galvanic Skin Response"
[23] "Q6. Breathing" "Q7. Perspiration"
[25] "Q8.Pupil Dilation" "Q9. Adrenaline Production"
目前,我有在一行中记录每个ID的属性的数据。从本质上讲,这意味着每行只有一个唯一 ID 的属性。
我在这里看到了另一个问题,但未能理解它。谁能帮忙?
我不明白你实际上希望数据最终看起来如何。我的猜测是您希望数据具有ID,性别,位置,生成作为前4列,然后将问题转换为列名称,其答案作为这些列下的值。 为此,您只需在 R reshape2
包中使用 melt
和 dcast
函数
x=melt(data,id=c("ID","Gender","Location","Generation"))
#this will melt the data frame telling R that these 4 variables are your primary keys
final_data=dcast(x, ID + Gender + Location + Generation ~ Question, value.var="Response")
我认为这将解决问题