每月以R计算唯一值



我有一个数据集,里面有来自现场观察员的死鸟记录。

Death.Date Observer Species Bird.ID
1 03/08/2021       DA      MF FC10682
2 15/08/2021       AG      MF FC10698
3 12/01/2022       DA      MF FC20957
4 09/02/2022       DA      MF FC10708

我想从中生成一个具有唯一Bird.ID/Month数量的数据集,这样我就可以从中生成图形。("独特"是因为有些人会犯错误,有时会进入一只鸟两次(。

这种情况下的输出是:

Month Number of dead 
08/2021 2
01/2022 1
02/2022 1

其想法是使用distinct函数,但按月份(知道值的格式为日期格式dd/mm/yyyy(。

  1. 如果您的日期列是字符类型,则首先使用dmy转换为日期类型

  2. 将格式更改为年月

  3. group_bysummarize

library(dplyr)
library(lubridate) # in case your Date is in character format
df %>% 
mutate(Death.Date = dmy(Death.Date)) %>% # you may not need this line
mutate(Month = format(as.Date(Death.Date), "%m/%Y")) %>%
group_by(Month) %>% 
summarise(`Number of dead`=n())
Month   `Number of dead`
<chr>              <int>
1 01/2022                1
2 02/2022                1
3 08/2021                2

为了完整性,这可以使用aggregate实现,而不需要任何额外的包:

df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)

aggregate.data.frame(
x = df["Bird.ID"],
by = list(death_month = format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y")),
FUN = function(x) {length(unique(x))}
)

备注

  • 匿名函数function(x) {length(unique(x))提供唯一值的计数
  • format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y"))调用确保提供月/年字符串

数据表解决方案

library(data.table)
library(lubridate)
# Reproductible example with a duplicated bird
deadbirds <- data.table::data.table(Death.Date = c("03/08/2021", "15/08/2021", "12/01/2022", "09/02/2022", "03/08/2021"),
Observer   = c("DA", "AG", "DA", "DA", "DA"),
Species    = c("MF", "MF", "MF" , "MF", "MF"),
Bird.ID    = c("FC10682", "FC10698", "FC20957", "FC10708", "FC10682"))
# Clean dataset = option 1 : delete all duplicated row
deadbirds <- base::unique(deadbirds)
# Clean dataset = option 2 : keep only the first line by bird (can be useful when there is duplicated data with differents values in useless columns)
deadbirds <- deadbirds[
j = .SD[1],
by = c("Bird.ID")
]
# Death.Date as date
deadbirds <- deadbirds[
j = Death.Date := lubridate::dmy(Death.Date)
]
# Create month.Death.Date
deadbirds <- deadbirds[
j = month.Death.Date := base::paste0(lubridate::month(Death.Date),
"/",
lubridate::year(Death.Date))
]
# Count by month
deadbirds <- deadbirds[
j  = `Number of dead` := .N,
by = month.Death.Date]

基于tidyverselubridatezoo::as.yearmon:的可能解决方案

library(tidyverse)
library(lubridate)
library(zoo)
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
df %>% 
group_by(date = as.yearmon(dmy(Death.Date))) %>% 
summarise(nDead = n_distinct(Bird.ID), .groups = "drop")
#> # A tibble: 3 x 2
#>   date      nDead
#>   <yearmon> <int>
#> 1 Aug 2021      2
#> 2 Jan 2022      1
#> 3 Feb 2022      1

您可以使用:

as.data.frame(table(format(as.Date(df$Death.Date,'%d/%m/%Y'), '%m/%Y')))
#       Var1 Freq
# 1 01/2022    1
# 2 02/2022    1
# 3 08/2021    2

数据:

df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)

相关内容

  • 没有找到相关文章

最新更新