改变输出
我一直在下面的代码来计算每小时的百分比(时间列d h)为每个行为,但它是混合的时间列的顺序和错误地计算百分比。我附上了输出的样本和一些数据。任何帮助都非常感谢!
S06Behav <- S06 %>%
group_by(Time, PredictedBehaviorFull, Context)%>%
summarise(count= n())
S06Proportions<-S06Behav %>%
group_by(Time, PredictedBehaviorFull, Context) %>%
summarise(n = sum(count)) %>%
mutate(percentage = n / sum(n))
我的数据的一个样本是https://pastebin.com/KE0xEzk7
谢谢
我认为百分比没有按预期计算的原因是因为根据代码,您正在根据2个相同的值确定百分比,因此比例为1.0。
我不完全确定你的问题,但如果当你说">混淆时间列的顺序&;,你的意思是整个Time
列是不正确的,那么你可能最好使用lubridate
包来制作你的Time
列。
library(lubridate)
S06 %>%
# first we convert the Timestamp column into datetime format
mutate(
Timestamp = ymd_hms(Timestamp)
) %>%
# then, we can extract the components from the Timestamp
mutate(
date = date(Timestamp),
hour = lubridate::hour(Timestamp),
timestamp_hour = ymd_h(str_c(date, ' ', hour))
) %>%
{. ->> S06_a} # this saves the data as 'S06_a' to use next
如果我理解正确的话,你想确定每小时观察到的每种行为类型的百分比。
S06_a %>%
# then, work out the total number of observations per hour, context and behaviour
group_by(timestamp_hour, Context, PredictedBehaviorFull) %>%
summarise(
behav_total = n()
) %>%
# calculate the total number of observations per hour
group_by(timestamp_hour) %>%
mutate(
hour_total = sum(behav_total),
percentage = behav_total / hour_total
)
生成以下输出:
# A tibble: 7 x 6
# Groups: timestamp_hour [3]
timestamp_hour Context PredictedBehaviorFull behav_total hour_total percentage
<dttm> <chr> <chr> <int> <int> <dbl>
1 2020-05-23 19:00:00 Present Bait 1971 2184 0.902
2 2020-05-23 19:00:00 Present Boat 96 2184 0.0440
3 2020-05-23 19:00:00 Present No_OP 117 2184 0.0536
4 2020-05-24 10:00:00 Absent Bait 9 1202 0.00749
5 2020-05-24 10:00:00 Absent No_OP 1193 1202 0.993
6 2020-05-24 11:00:00 Absent Bait 5 129 0.0388
7 2020-05-24 11:00:00 Absent No_OP 124 129 0.961