将每人累积变量扩展到时间间隔变量

我有数据集，显示人们如何在10分钟的间隔内度过30分钟。

Person     cumulative_time   Activity
A              10             Game
A              30             Eat
B              10             Sleep
B              20             Game
B              30             Sleep
which means person A did gaming during the first 10 minutes, 
and eating during the next 20 minutes, 
and person B was sleeping for the first 10 min, 
gaming for the next 10 min, and sleeping for the last 10 mins.

我想重构数据集。每一行都是每个独特的人。

然后，每一列将是这样的每个时间间隔。

Person          time10    time20         time30
A             Game         Eat           Eat
B             Sleep        Game          Sleep

我知道我可以用"折叠"来使人独一无二，但我不知道如何将其用于我的目的。"reshape"命令做了类似的事情，但我再次不知道如何使用它来做我想做的事情。

重塑是解决这个问题的方法。这样的事情可能会完成您需要的。

clear
input str1 Person int cumulative_time str8 Activity
A              10             Game
A              30             Eat
B              10             Sleep
B              20             Game
B              30             Sleep
end
rename Activity time
reshape wide time, i(Person) j(cumulative_time)
replace time20 = time10 if missing(time20)
replace time30 = time20 if missing(time30)
list, clean

如果你的问题有很多cumulative_time值，而不仅仅是三个，我会用不同的方式解决缺失值的问题。

除了 William Lisowski 的答案之外，这里还有一种使用tsset和tsfill命令的方法：

clear
input str1 Person int cumulative_time str8 Activity
A              10             Game
A              30             Eat
B              10             Sleep
B              20             Game
B              30             Sleep
end
rename Activity time
egen id = group(Person)
tsset id cumulative_time, delta(10)
tsfill, full
bysort id : replace Person = Person[_n-1] if Person==""
bysort id : replace time= time[_n+1] if time==""
drop id
reshape wide time, i(Person) j(cumulative_time)
list, clean

哪些输出：

Person   time10   time20   time30  
1.        A     Game      Eat      Eat  
2.        B    Sleep     Game    Sleep

相关内容

最新更新

热门标签：