我有一个数据框,其中包含以下变量 day(工作日从 1-7(和时间变量 t1 到 t7,用于记录在特定时间内执行的活动。
我想确定每个同源时间段在 7 个工作日内发生相同活动的次数。
输入:
day t1 t2 t3 t4 t5 t6 t7
1 1 0 1 0 0 0 1
1 1 0 1 0 4 0 1
4 2 3 1 0 1 0 1
5 1 1 1 0 0 0 1
输出:
time Most frequent
t1 1
t2 0,1,3
t3 1
t4 0
t5 0
t6 0
t7 1
这是一个dplyr
解决方案:
df %>%
pivot_longer(-day) %>%
group_by(name,value) %>%
distinct() %>%
mutate(freq = n()) %>%
group_by(name) %>%
filter(freq == max(freq)) %>%
select(name, value) %>%
distinct() %>%
group_by(name) %>%
summarise(`Most frequent` = paste(value, collapse = ",")) %>%
rename(time = name)
这给了:
time `Most frequent`
<chr> <chr>
1 t1 1
2 t2 0,3,1
3 t3 1
4 t4 0
5 t5 0
6 t6 0
7 t7 1
以下是带有一些注释的代码:
df %>%
pivot_longer(-day) %>% # Structuring data in long format
group_by(name,value) %>% # Grouping by name(t#) and value(activity)
distinct() %>% # Selecting distinct instances of time + activity (i.e. day + t#)
mutate(freq = n()) %>% # Counting unique occurances of time + activity
group_by(name) %>% # Grouping by time
filter(freq == max(freq)) %>% # Filtering to select only the most frequent cases
select(name, value) %>% # Selecting only the variables name and value
distinct() %>% # Filtering for unique occurances
group_by(name) %>% # Grouping by name (time)
summarise(`Most frequent` = paste(value, collapse = ",")) %>% # Aggregating by time, pasting values on separate rows together with a comma separating the values
rename(time = name) # Renaming variable name to time