使用' group_by() '和r中的两个因子变量计数时如何考虑' NA '



我有一个数据框架df1,其中我有属于不同区域(df1$Regions)的不同站点(df1$Site),其中我有关于草食证据及其类型(df1$Herbivory_type)的数据。当没有草食时,df1$Herbivory_type=NA。下面我展示了一个数据框架的例子:

df1 <- data.frame(Region=c("ALI1","ALI1","ALI1","ALI1","ALI2","ALI2","ALI2","ALI3","ALI3","ALI3","ALI3","ALI5","ALI5"),
Site=c("ALI1_A","ALI1_B","ALI1_C","ALI1_D","ALI2_A","ALI2_B","ALI2_C","ALI3_A","ALI3_B","ALI3_C","ALI3_D","ALI5_A","ALI5_B"),
Herbivory_type=c(NA,"S",NA,NA,NA,NA,NA,NA,"S","S",NA,NA,"S"))
df1$Herbivory_type <- as.factor(df1$Herbivory_type)
df1
Region   Site Herbivory_type
1    ALI1 ALI1_A           <NA>
2    ALI1 ALI1_B              S
3    ALI1 ALI1_C           <NA>
4    ALI1 ALI1_D           <NA>
5    ALI2 ALI2_A           <NA>
6    ALI2 ALI2_B           <NA>
7    ALI2 ALI2_C           <NA>
8    ALI3 ALI3_A           <NA>
9    ALI3 ALI3_B              S
10   ALI3 ALI3_C              S
11   ALI3 ALI3_D           <NA>
12   ALI5 ALI5_A           <NA>
13   ALI5 ALI5_B              S

df1$Site的计数中考虑NA,我需要知道按地区的食草动物事件数。我希望得到这样的结果:

df2
Region N_Hervivory_S
1   ALI1             1
2   ALI2             0   # All sites have `NA`, thus, herbivorims is 0 in this region.
3   ALI3             2
4   ALI5             1

我试过了:

as.data.frame(df1 %>% group_by(Region,Herbivory_type) %>% summarise(N = n()))

但是输出不是我想要的

Region Herbivory_type N
1   ALI1              S 1
2   ALI1           <NA> 3
3   ALI2           <NA> 3
4   ALI3              S 2
5   ALI3           <NA> 2
6   ALI5              S 1
7   ALI5           <NA> 1

有人知道怎么做吗?

Thanks in advance

您可以使用count()!is.na(Herbivory_type)按组求和,并获得每个区域的非缺失值的数量。

library(dplyr)
df1 %>%
count(Region, wt = !is.na(Herbivory_type))
# # A tibble: 4 × 2
#   Region   res
#   <chr>  <int>
# 1 ALI1       1
# 2 ALI2       0
# 3 ALI3       2
# 4 ALI5       1
library(dplyr)
df1 %>% 
group_by(Region) %>%
summarise(n_Herbivory_S = sum(Herbivory_type %in% c("S")))

(假设真实数据集中可能有其他可以忽略的类别-否则!is.na()更简单)

您可以计算非nas,即

library(dplyr)
df1 %>% 
group_by(Region) %>% 
summarise(res = sum(!is.na(Herbivory_type)))
# A tibble: 4 × 2
Region   res
<chr>  <int>
1 ALI1       1
2 ALI2       0
3 ALI3       2
4 ALI5       1

最新更新