r语言 - 如何创建一个新的列,其中的行是由前一行(计算)确定?



我有这个表格,我想计算每个城市每个月的通货膨胀率

我尝试了滞后和领先功能,但CPI与下一个城市重叠,例如:马尔默的移动CPI是斯德哥尔摩的05-01-2005 CPI

您需要group_by来避免这个问题。

# Read in data
inflation  <- read.table(text = "CPI    Date    City
112 2005-01-01  Stockholm
113.5   2005-02-01  Stockholm
115 2005-03-01  Stockholm
115.6   2005-04-01  Stockholm
115.8   2005-05-01  Stockholm
106 2005-01-01  Malmo
107.5   2005--02-01 Malmo
110 2005-03-01  Malmo
113 2005-04-01  Malmo
117 2005-05-01  Malmo", h = T)
# Perform calculation
library(dplyr)
inflation  |>
group_by(City)  |>
mutate(
cpi_change = lead(CPI) - CPI,
cpi_change_percent = cpi_change / CPI * 100
)

输出:

# A tibble: 10 x 5
# # Groups:   City [2]
#      CPI Date        City      cpi_change cpi_change_percent
#    <dbl> <chr>       <chr>          <dbl>              <dbl>
#  1  112  2005-01-01  Stockholm      1.5                1.34
#  2  114. 2005-02-01  Stockholm      1.5                1.32
#  3  115  2005-03-01  Stockholm      0.600              0.522
#  4  116. 2005-04-01  Stockholm      0.200              0.173
#  5  116. 2005-05-01  Stockholm     NA                 NA
#  6  106  2005-01-01  Malmo          1.5                1.42
#  7  108. 2005--02-01 Malmo          2.5                2.33
#  8  110  2005-03-01  Malmo          3                  2.73
#  9  113  2005-04-01  Malmo          4                  3.54
# 10  117  2005-05-01  Malmo         NA                 NA

您将得到上个月的NAs,因为我们不知道下一年的费率。或者,如果您想计算出与以前相比的变化,您可以使用lag而不是lead,但这样您将获得第一个月的NAs。

另一种解决问题的方法:

使用data.table

library(data.table)
setDT(df)[, CPI_change_perc := (CPI/shift(CPI) - 1) * 100, by=City]
CPI        Date      City CPI_change_perc
1: 112.0  2005-01-01 Stockholm              NA
2: 113.5  2005-02-01 Stockholm       1.3392857
3: 115.0  2005-03-01 Stockholm       1.3215859
4: 115.6  2005-04-01 Stockholm       0.5217391
5: 115.8  2005-05-01 Stockholm       0.1730104
6: 106.0  2005-01-01     Malmo              NA
7: 107.5 2005--02-01     Malmo       1.4150943
8: 110.0  2005-03-01     Malmo       2.3255814
9: 113.0  2005-04-01     Malmo       2.7272727
10: 117.0  2005-05-01     Malmo       3.5398230

使用dplyr

library(dplyr)
df |> 
group_by(City) |> 
mutate(CPI_change_perc = (CPI/shift(CPI) - 1) * 100)
# A tibble: 10 × 4
# Groups:   City [2]
CPI Date        City      CPI_change_perc
<dbl> <chr>       <chr>               <dbl>
1  112  2005-01-01  Stockholm          NA    
2  114. 2005-02-01  Stockholm           1.34 
3  115  2005-03-01  Stockholm           1.32 
4  116. 2005-04-01  Stockholm           0.522
5  116. 2005-05-01  Stockholm           0.173
6  106  2005-01-01  Malmo              NA    
7  108. 2005--02-01 Malmo               1.42 
8  110  2005-03-01  Malmo               2.33 
9  113  2005-04-01  Malmo               2.73 
10  117  2005-05-01  Malmo               3.54 

df = structure(list(CPI = c(112, 113.5, 115, 115.6, 115.8, 106, 107.5, 
110, 113, 117), Date = c("2005-01-01", "2005-02-01", "2005-03-01", 
"2005-04-01", "2005-05-01", "2005-01-01", "2005--02-01", "2005-03-01", 
"2005-04-01", "2005-05-01"), City = c("Stockholm", "Stockholm", 
                     "Stockholm", "Stockholm", "Stockholm", "Malmo", "Malmo", "Malmo", 
                     "Malmo", "Malmo")), row.names = c(NA, -10L), class = "data.frame")

相关内容

  • 没有找到相关文章

最新更新