根据上一年组的值添加行以填充"missing"年份

  • 本文关键字:填充 missing 年份 添加行 上一年 r
  • 更新时间 :
  • 英文 :


我有一个以下格式的数据帧:

df <- data.frame(year = c(2000, 2000, 2000, 2000, 2000, 2004, 2004, 2004, 2004, 2004,
2010, 2010, 2010, 2010, 2010),
city = c("City A", "City B", "City C", "City D", "City E", 
"City A", "City B", "City C", "City D", "City E",
"City A", "City B", "City C", "City D", "City E"),
constant_y = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))
df
year   city constant_y
1  2000 City A          1
2  2000 City B          2
3  2000 City C          3
4  2000 City D          4
5  2000 City E          5
6  2004 City A          6
7  2004 City B          7
8  2004 City C          8
9  2004 City D          9
10 2004 City E         10
11 2010 City A         11
12 2010 City B         12
13 2010 City C         13
14 2010 City D         14
15 2010 City E         15

我想填写/添加每个城市的缺失年份,使用该城市上一年的数据。因此,在某种程度上,在更改年份列值的同时复制行,按城市分组。以下是我试图获得的每个城市(以城市A为例(的输出

year   city constant_y
1  2000 City A          1
2  2001 City A          1
3  2002 City A          1
4  2003 City A          1
5  2004 City A          6
6  2005 City A          6
7  2006 City A          6
8  2007 City A          6
9  2008 City A          6
10 2009 City A          6  
11 2010 City A         11
12 2011 City A         11
13 2012 City A         11
14 2013 City A         11
15 2014 City A         11  
16 2015 City A         11
17 2016 City A         11
18 2017 City A         11
19 2018 City A         11  
20 2019 City A         11      

城市B、C、D等也是如此(使用前几年的"常数_y"值(。例如,B市在2003年之前将有2个,2004年至2009年为7个,2010年至2019年为12个。

所以,是的,我只想添加重复/使用每个城市的";constant_y";至下一年。我的数据在某一年(2010年(停止,但我想使用从2010年到2010年的值将其进一步扩展几年,例如上例中的2019年。我希望我没有过于复杂,但我不确定如何解决

这里有一种方法,首先找到所有可能的城市/年份组合,将其加入原始数据,然后(通过最后一次观测结转技术(每个-city填充constant_y

dplyr

library(dplyr)
library(tidyr) # expand, fill
df %>%
expand(city, year = do.call(seq, as.list(range(year)))) %>%
full_join(df, by = c("city", "year")) %>%
arrange(city, year) %>%
fill(constant_y)
# # A tibble: 55 x 3
#    city    year constant_y
#    <chr>  <dbl>      <dbl>
#  1 City A  2000          1
#  2 City A  2001          1
#  3 City A  2002          1
#  4 City A  2003          1
#  5 City A  2004          6
#  6 City A  2005          6
#  7 City A  2006          6
#  8 City A  2007          6
#  9 City A  2008          6
# 10 City A  2009          6
# # ... with 45 more rows

当然,这只会持续到2010年,因为这就是你原始数据中的全部内容。如果您需要它超越原始数据,则更改为

df %>%
expand(city, year = do.call(seq, as.list(range(c(year, 2019))))) %>%
...                                          # ^^^^^^^^^^^^^ different

基数R

# library(zoo) # na.locf
df2 <- merge(
df,
expand.grid(city = unique(df$city), year = do.call(seq, as.list(range(df$year)))),
by = c("city", "year"), all = TRUE)
df2$constant_y <- ave(df2$constant_y, df2$city, FUN = zoo::na.locf, na.rm = FALSE)
subset(df2, city == "City A")
#      city year constant_y
# 1  City A 2000          1
# 2  City A 2001          1
# 3  City A 2002          1
# 4  City A 2003          1
# 5  City A 2004          6
# 6  City A 2005          6
# 7  City A 2006          6
# 8  City A 2007          6
# 9  City A 2008          6
# 10 City A 2009          6
# 11 City A 2010         11

(与2010年和2019年相同。(

数据表

library(data.table)
DT <- as.data.table(df) # canonical would be `setDT(df)` instead
DT <- DT[, CJ(city = unique(city), year = do.call(seq, as.list(range(year))))
][DT, constant_y := i.constant_y, on = .(city, year)
][, constant_y := nafill(constant_y, type = "locf"), by = .(city)]
DT
#       city  year constant_y
#     <char> <int>      <num>
#  1: City A  2000          1
#  2: City A  2001          1
#  3: City A  2002          1
#  4: City A  2003          1
#  5: City A  2004          6
#  6: City A  2005          6
#  7: City A  2006          6
#  8: City A  2007          6
#  9: City A  2008          6
# 10: City A  2009          6
# ---                        
# 46: City E  2001          5
# 47: City E  2002          5
# 48: City E  2003          5
# 49: City E  2004         10
# 50: City E  2005         10
# 51: City E  2006         10
# 52: City E  2007         10
# 53: City E  2008         10
# 54: City E  2009         10
# 55: City E  2010         15

最新更新