r语言 - 按组划分的频率



我有一个数据框,其中包含以下变量 day(工作日从 1-7(和时间变量 t1 到 t7,用于记录在特定时间内执行的活动。

我想确定每个同源时间段在 7 个工作日内发生相同活动的次数。

输入:

day t1 t2 t3 t4 t5 t6 t7
1  1  0  1  0  0  0  1
1  1  0  1  0  4  0  1
4  2  3  1  0  1  0  1
5  1  1  1  0  0  0  1

输出:

time   Most frequent
t1     1    
t2     0,1,3       
t3     1
t4     0
t5     0
t6     0
t7     1

这是一个dplyr解决方案:

df %>% 
pivot_longer(-day) %>% 
group_by(name,value) %>% 
distinct() %>% 
mutate(freq = n()) %>% 
group_by(name) %>% 
filter(freq == max(freq)) %>% 
select(name, value) %>% 
distinct() %>% 
group_by(name) %>% 
summarise(`Most frequent` = paste(value, collapse = ",")) %>% 
rename(time = name)

这给了:

time  `Most frequent`
<chr> <chr>          
1 t1    1              
2 t2    0,3,1          
3 t3    1              
4 t4    0              
5 t5    0              
6 t6    0              
7 t7    1 

以下是带有一些注释的代码:

df %>% 
pivot_longer(-day) %>% # Structuring data in long format
group_by(name,value) %>% # Grouping by name(t#) and value(activity)
distinct() %>%  # Selecting distinct instances of time + activity (i.e. day + t#)
mutate(freq = n()) %>% # Counting unique occurances of time + activity
group_by(name) %>% # Grouping by time
filter(freq == max(freq)) %>% # Filtering to select only the most frequent cases
select(name, value) %>% # Selecting only the variables name and value
distinct() %>% # Filtering for unique occurances
group_by(name) %>% # Grouping by name (time)
summarise(`Most frequent` = paste(value, collapse = ",")) %>% # Aggregating by time, pasting values on separate rows together with a comma separating the values
rename(time = name) # Renaming variable name to time

最新更新