我有数据集,显示人们如何在10分钟的间隔内度过30分钟。
Person cumulative_time Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
which means person A did gaming during the first 10 minutes,
and eating during the next 20 minutes,
and person B was sleeping for the first 10 min,
gaming for the next 10 min, and sleeping for the last 10 mins.
我想重构数据集。每一行都是每个独特的人。
然后,每一列将是这样的每个时间间隔。
Person time10 time20 time30
A Game Eat Eat
B Sleep Game Sleep
我知道我可以用"折叠"来使人独一无二,但我不知道如何将其用于我的目的。"reshape"命令做了类似的事情,但我再次不知道如何使用它来做我想做的事情。
重塑是解决这个问题的方法。这样的事情可能会完成您需要的。
clear
input str1 Person int cumulative_time str8 Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
end
rename Activity time
reshape wide time, i(Person) j(cumulative_time)
replace time20 = time10 if missing(time20)
replace time30 = time20 if missing(time30)
list, clean
如果你的问题有很多cumulative_time值,而不仅仅是三个,我会用不同的方式解决缺失值的问题。
除了 William Lisowski 的答案之外,这里还有一种使用tsset
和tsfill
命令的方法:
clear
input str1 Person int cumulative_time str8 Activity
A 10 Game
A 30 Eat
B 10 Sleep
B 20 Game
B 30 Sleep
end
rename Activity time
egen id = group(Person)
tsset id cumulative_time, delta(10)
tsfill, full
bysort id : replace Person = Person[_n-1] if Person==""
bysort id : replace time= time[_n+1] if time==""
drop id
reshape wide time, i(Person) j(cumulative_time)
list, clean
哪些输出:
Person time10 time20 time30
1. A Game Eat Eat
2. B Sleep Game Sleep