r语言 - 基于因子水平的子集数据帧,并以子集中的变量为条件创建分位数的新变量



我有一个这样的数据框:

set.seed(567) 
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )]) 
freq= rpois(36, lambda=12)
df<-data.frame(year, lepsp, freq)
df<- 
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))

我想按year子集df,并制作一个名为quant的新列,将相应的四分位数分配给子集中的每个freq值。新列可以将分位数分配为probs = seq(0, 1, 0.05)。最重要的是,我后来能够根据分位数分配类别,例如,低于 25% 的任何内容都被归类为稀有。因此,这些可以是宽泛的四分位数名称,但是百分位数增量越小,我将越"回旋余地"将某些东西归类为罕见r或普通c

输出应如下所示:

df<-data.frame(df, quant= c(75,50,25,50,50,25,75,50,25,75,75,100,50,100,100,50,25,25,75,25,75,50,50,75,75,25,25,50,50,50,25,75,75,25,75,50), 
abucat= c("c", "r", "r","r","r", "r","c","r","r", "c", "c", "c", "r","c", "c","r" , "r", "r", "c", "r", "c","r","r","c","c","r",
"r","r","r","r","r","c","c","r","c","r"))

我试过:

library(dplyr)
df<- 
df %>%
group_by(year) %>%
mutate(quant = quantile(freq, probs= seq(0, 1, 0.25)))

我更新了代码以使用case_when使其更直观。您应该能够看到对quant进行分类的每个案例以及相应的值。 然后,我单独使用tidyr将其分成 2 列。

library(dplyr)
library(tidyr)
set.seed(567) 
year= as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp= c(letters[seq(from = 1, to = 20 )], c('a','b','c'),letters[seq(from =8, to = 20 )]) 
freq= rpois(36, lambda=12)
df<-data.frame(year, lepsp, freq)
df<- 
df %>%
group_by(year) %>%
mutate(rank = dense_rank(-freq))
df<-data.frame(df, quant= c(75,50,25,50,50,25,75,50,25,75,75,100,50,100,100,50,25,25,75,25,75,50,50,75,75,25,25,50,50,50,25,75,75,25,75,50), 
abucat= c("c", "r", "r","r","r", "r","c","r","r", "c", "c", "c", "r","c", "c","r" , "r", "r", "c", "r", "c","r","r","c","c","r",
"r","r","r","r","r","c","c","r","c","r"))
df %>%
group_by(year) %>%
mutate(qtile = list(quantile(freq))) %>% 
rowwise() %>% 
mutate(q = case_when(freq <= qtile[2] ~ "25,r",
freq > qtile[2] & freq <=qtile[3] ~"50,r",
freq > qtile[3] & freq <=qtile[4] ~"75,c",
freq > qtile[4] ~ "100,c")) %>% 
separate(q, c("quant","abucat")) %>% 
select(-qtile)
#  Source: local data frame [36 x 6]
#  Groups: <by row>
#  
#  # A tibble: 36 x 6
#     year  lepsp  freq  rank quant abucat
#     <fct> <fct> <int> <int> <chr> <chr> 
#   1 1998  a        14     3 75    c     
#   2 1998  b        13     4 50    r     
#   3 1998  c         9     7 25    r     
#   4 1998  d        12     5 50    r     
#   5 1998  e        12     5 50    r     
#   6 1998  f         9     7 25    r     
#   7 1998  g        15     2 75    c     
#   8 1998  h        12     5 50    r     
#   9 1998  i        10     6 25    r     
#  10 1998  j        15     2 75    c     
#  # ... with 26 more rows

最新更新