我有下面一个生物学项目的数据集示例。我想计算1月4日到1月2日之间的数字增长率。即速率=(number_at_0104-number_at_010 2(/(number_al_0102((如果可能的话,在其他地方(
a <- c("Date", "Specie", "Number")
b <- c("2020-01-01", "Dog", "3")
c <- c("2020-01-02", "Dog", "4")
d <- c("2020-01-03", "Dog", "5")
e <- c("2020-01-04", "Dog", "6")
f <- c("2020-01-01", "Cat", "3")
g <- c("2020-01-02", "Cat", "7")
h <- c("2020-01-03", "Cat", "8")
i <- c("2020-01-04", "Cat", "10")
df <- as.data.frame(rbind(b, c, d, e, f, g, h, i))
names(df) <- a
df$Date <- as.Date(df$Date)
df$Number <- as.integer(df$Number)
我想计算一个增长率。我知道这个已经治疗过了,但我不确定我是否能在那里应用它。通常,我们使用lag((函数,但我有一些问题。
- 我们能告诉滞后函数要使用什么滞后吗(例如,不是前一个周期,而是前4个周期(
- 我的数据集要大得多,对于一些物种(比如猫(,我想计算2月20日至3月3日之间的增长率。对于其他人(比如小狗(,我想在5月5日至4月4日之间进行计算。我该怎么做
提前感谢您,
使用dplyr
start <- as.Date("2020-01-02")
end <- as.Date("2020-01-04")
df %>%
filter(Date %between% c(start, end)) %>%
arrange(Date, Species) %>%
group_by(Species) %>%
summarise(Growth = (last(Number) - first(Number)) / first(Number))
输出
Species Growth
<fct> <dbl>
1 Cat 0.25
2 Dog 0.5
数据请注意,我的测试数据已经是日期和数值
df <- data.frame(
Date = rep(seq.Date(as.Date("2020-01-01"), as.Date("2020-01-04"), "days"), 2),
Species = c(rep("Dog", 4), rep("Cat", 4)),
Number = 3:10
)
如果你想对每个物种进行不同的查找,你可以这样做。定义你的查找,输出将与物种、生长和从中提取的时期相匹配。
lookups <- list(
c("Species" = "Dog", "start" = "2020-01-01", "end" = "2020-01-04"),
c("Species" = "Cat", "start" = "2020-01-02", "end" = "2020-01-04")
)
bind_rows(lapply(lookups, function(species) {
df %>%
filter(Species == species["Species"] & Date %between% as.Date(c(species["start"], species["end"]))) %>%
arrange(Date, Species) %>%
group_by(Species) %>%
summarise(
Growth = (last(Number) - first(Number)) / first(Number),
Start = species["start"],
End = species["end"]
)
}))
# # A tibble: 2 x 4
# Species Growth Start End
# <chr> <dbl> <chr> <chr>
# 1 Dog 1 2020-01-01 2020-01-04
# 2 Cat 0.25 2020-01-02 2020-01-04
您可以使用-
library(dplyr)
start_date <- as.Date("2020-01-02")
end_date <- as.Date("2020-01-04")
df %>%
group_by(Specie) %>%
summarise(growth_rate = (Number[match(end_date, Date)] -
Number[match(start_date, Date)])/
Number[match(start_date, Date)])
# Specie growth_rate
# <chr> <dbl>
#1 Cat 0.429
#2 Dog 0.5
您可以将start_date
和end_date
替换为您选择的日期。
或者可能有点冗长但更清晰的答案是——
df %>%
group_by(Specie) %>%
summarise(num_end = Number[match(end_date, Date)],
num_start = Number[match(start_date, Date)],
growth_rate = (num_end - num_start)/num_start)
# Specie num_end num_start growth_rate
# <chr> <int> <int> <dbl>
#1 Cat 10 7 0.429
#2 Dog 6 4 0.5