我有这个表格,我想计算每个城市每个月的通货膨胀率
我尝试了滞后和领先功能,但CPI与下一个城市重叠,例如:马尔默的移动CPI是斯德哥尔摩的05-01-2005 CPI
您需要group_by
来避免这个问题。
# Read in data
inflation <- read.table(text = "CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo", h = T)
# Perform calculation
library(dplyr)
inflation |>
group_by(City) |>
mutate(
cpi_change = lead(CPI) - CPI,
cpi_change_percent = cpi_change / CPI * 100
)
输出:
# A tibble: 10 x 5
# # Groups: City [2]
# CPI Date City cpi_change cpi_change_percent
# <dbl> <chr> <chr> <dbl> <dbl>
# 1 112 2005-01-01 Stockholm 1.5 1.34
# 2 114. 2005-02-01 Stockholm 1.5 1.32
# 3 115 2005-03-01 Stockholm 0.600 0.522
# 4 116. 2005-04-01 Stockholm 0.200 0.173
# 5 116. 2005-05-01 Stockholm NA NA
# 6 106 2005-01-01 Malmo 1.5 1.42
# 7 108. 2005--02-01 Malmo 2.5 2.33
# 8 110 2005-03-01 Malmo 3 2.73
# 9 113 2005-04-01 Malmo 4 3.54
# 10 117 2005-05-01 Malmo NA NA
您将得到上个月的NAs,因为我们不知道下一年的费率。或者,如果您想计算出与以前相比的变化,您可以使用lag
而不是lead
,但这样您将获得第一个月的NAs。
另一种解决问题的方法:
使用data.table
包
library(data.table)
setDT(df)[, CPI_change_perc := (CPI/shift(CPI) - 1) * 100, by=City]
CPI Date City CPI_change_perc
1: 112.0 2005-01-01 Stockholm NA
2: 113.5 2005-02-01 Stockholm 1.3392857
3: 115.0 2005-03-01 Stockholm 1.3215859
4: 115.6 2005-04-01 Stockholm 0.5217391
5: 115.8 2005-05-01 Stockholm 0.1730104
6: 106.0 2005-01-01 Malmo NA
7: 107.5 2005--02-01 Malmo 1.4150943
8: 110.0 2005-03-01 Malmo 2.3255814
9: 113.0 2005-04-01 Malmo 2.7272727
10: 117.0 2005-05-01 Malmo 3.5398230
使用dplyr
包
library(dplyr)
df |>
group_by(City) |>
mutate(CPI_change_perc = (CPI/shift(CPI) - 1) * 100)
# A tibble: 10 × 4
# Groups: City [2]
CPI Date City CPI_change_perc
<dbl> <chr> <chr> <dbl>
1 112 2005-01-01 Stockholm NA
2 114. 2005-02-01 Stockholm 1.34
3 115 2005-03-01 Stockholm 1.32
4 116. 2005-04-01 Stockholm 0.522
5 116. 2005-05-01 Stockholm 0.173
6 106 2005-01-01 Malmo NA
7 108. 2005--02-01 Malmo 1.42
8 110 2005-03-01 Malmo 2.33
9 113 2005-04-01 Malmo 2.73
10 117 2005-05-01 Malmo 3.54
df = structure(list(CPI = c(112, 113.5, 115, 115.6, 115.8, 106, 107.5,
110, 113, 117), Date = c("2005-01-01", "2005-02-01", "2005-03-01",
"2005-04-01", "2005-05-01", "2005-01-01", "2005--02-01", "2005-03-01",
"2005-04-01", "2005-05-01"), City = c("Stockholm", "Stockholm",
"Stockholm", "Stockholm", "Stockholm", "Malmo", "Malmo", "Malmo",
"Malmo", "Malmo")), row.names = c(NA, -10L), class = "data.frame")
df = structure(list(CPI = c(112, 113.5, 115, 115.6, 115.8, 106, 107.5,
110, 113, 117), Date = c("2005-01-01", "2005-02-01", "2005-03-01",
"2005-04-01", "2005-05-01", "2005-01-01", "2005--02-01", "2005-03-01",
"2005-04-01", "2005-05-01"), City = c("Stockholm", "Stockholm",
"Stockholm", "Stockholm", "Stockholm", "Malmo", "Malmo", "Malmo",
"Malmo", "Malmo")), row.names = c(NA, -10L), class = "data.frame")