我有一个数据集,里面有来自现场观察员的死鸟记录。
Death.Date Observer Species Bird.ID
1 03/08/2021 DA MF FC10682
2 15/08/2021 AG MF FC10698
3 12/01/2022 DA MF FC20957
4 09/02/2022 DA MF FC10708
我想从中生成一个具有唯一Bird.ID/Month数量的数据集,这样我就可以从中生成图形。("独特"是因为有些人会犯错误,有时会进入一只鸟两次(。
这种情况下的输出是:
Month Number of dead
08/2021 2
01/2022 1
02/2022 1
其想法是使用distinct
函数,但按月份(知道值的格式为日期格式dd/mm/yyyy(。
-
如果您的日期列是字符类型,则首先使用
dmy
转换为日期类型 -
将格式更改为年月
-
group_by
和summarize
library(dplyr)
library(lubridate) # in case your Date is in character format
df %>%
mutate(Death.Date = dmy(Death.Date)) %>% # you may not need this line
mutate(Month = format(as.Date(Death.Date), "%m/%Y")) %>%
group_by(Month) %>%
summarise(`Number of dead`=n())
Month `Number of dead`
<chr> <int>
1 01/2022 1
2 02/2022 1
3 08/2021 2
为了完整性,这可以使用aggregate
实现,而不需要任何额外的包:
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
aggregate.data.frame(
x = df["Bird.ID"],
by = list(death_month = format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y")),
FUN = function(x) {length(unique(x))}
)
备注
- 匿名函数
function(x) {length(unique(x))
提供唯一值的计数 format(as.Date(df$Death.Date, "%d/%m/%Y"), "%m/%Y"))
调用确保提供月/年字符串
数据表解决方案
library(data.table)
library(lubridate)
# Reproductible example with a duplicated bird
deadbirds <- data.table::data.table(Death.Date = c("03/08/2021", "15/08/2021", "12/01/2022", "09/02/2022", "03/08/2021"),
Observer = c("DA", "AG", "DA", "DA", "DA"),
Species = c("MF", "MF", "MF" , "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708", "FC10682"))
# Clean dataset = option 1 : delete all duplicated row
deadbirds <- base::unique(deadbirds)
# Clean dataset = option 2 : keep only the first line by bird (can be useful when there is duplicated data with differents values in useless columns)
deadbirds <- deadbirds[
j = .SD[1],
by = c("Bird.ID")
]
# Death.Date as date
deadbirds <- deadbirds[
j = Death.Date := lubridate::dmy(Death.Date)
]
# Create month.Death.Date
deadbirds <- deadbirds[
j = month.Death.Date := base::paste0(lubridate::month(Death.Date),
"/",
lubridate::year(Death.Date))
]
# Count by month
deadbirds <- deadbirds[
j = `Number of dead` := .N,
by = month.Death.Date]
基于tidyverse
、lubridate
和zoo::as.yearmon
:的可能解决方案
library(tidyverse)
library(lubridate)
library(zoo)
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)
df %>%
group_by(date = as.yearmon(dmy(Death.Date))) %>%
summarise(nDead = n_distinct(Bird.ID), .groups = "drop")
#> # A tibble: 3 x 2
#> date nDead
#> <yearmon> <int>
#> 1 Aug 2021 2
#> 2 Jan 2022 1
#> 3 Feb 2022 1
您可以使用:
as.data.frame(table(format(as.Date(df$Death.Date,'%d/%m/%Y'), '%m/%Y')))
# Var1 Freq
# 1 01/2022 1
# 2 02/2022 1
# 3 08/2021 2
数据:
df <- data.frame(
Death.Date = c("3/8/2021", "15/08/2021", "12/1/2022", "9/2/2022"),
Observer = c("DA", "AG", "DA", "DA"),
Species = c("MF", "MF", "MF", "MF"),
Bird.ID = c("FC10682", "FC10698", "FC20957", "FC10708")
)