如何在 R 中计算区间内的'number of values of a particular variable of a tibble'?



需要软件包

"德普利尔">

'NYCFLIGHTS13'

我正在使用的 tibble 是

q4<-flights%>%group_by(year,month,day)%>%summarise(cancelled=sum(is.na(dep_time)),avg_delay=mean(arr_delay,na.rm = T),totalflights=n())
q4<-q4%>%mutate(prop=cancelled/totalflights)

q4%>%ungroup()%>%count(prop)

给我

# A tibble: 342 x 2
prop     n
<dbl> <int>
1 0           7
2 0.00101     1
3 0.00102     2
4 0.00102     1
5 0.00102     1
6 0.00102     1
7 0.00103     1
8 0.00103     1
9 0.00104     1
10 0.00104     1
# ... with 332 more rows

有没有办法我可以(不使用暴力逻辑,如循环等( 以所需的形式获得输出,我正在寻找单行或双行解决方案。 dplyr中是否有一个函数可以做到这一点?

期望输出:

# A tibble: X x Y
prop     n
<dbl> <int>
1 0-0.1       45          #random numbers
2 0.1-0.2     54
3 0.2-0.3     23

下面,我使用cut对数据进行装箱,然后table来计算每个箱的实例。

data.frame(cut(q4$prop, breaks = c(0, 0.1, 0.2, 0.3)) %>% table)

生产

#           . Freq
# 1   (0,0.1]  341
# 2 (0.1,0.2]   13
# 3 (0.2,0.3]    2

您可以在q4<-q4%>%mutate(prop=cancelled/totalflights)后使用 :

q4 %>% ungroup() %>%
mutate(category = cut(prop, breaks = c(-Inf,0.1,0.2,Inf), labels = c("0-0.1","0.1-0.2", "0.2 - 0.3") %>%
count(category)

我相信它会奏效

我自己想出了一个,我也觉得这是最好的。

q4%>%ungroup()%>%count(cut_width(prop,0.025))

输出:

# A tibble: 11 x 2
`cut_width(prop, 0.025)`     n
<fct>                    <int>
1 [-0.0125,0.0125]           233
2 (0.0125,0.0375]             66
3 (0.0375,0.0625]             26
4 (0.0625,0.0875]             13
5 (0.0875,0.112]              14
6 (0.112,0.138]                4

最新更新