使用 R 插入数据帧中的缺失数据



我有一个类似于下面的数据帧:

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001        NA    NA
3  France    FR 2002        NA    NA
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        NA    NA
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000 

我之前使用以下代码来获取差异:

df <- df %>%
arrange(country, year) %>%  #sort data
group_by(country) %>%
mutate_if(is.numeric, funs(d = . - lag(.)))

我想通过计算HappinessPower的数据点之间的差异来扩展此代码,将其除以数据点之间的年差并计算要替换 NA 的值,从而产生以下输出。

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001      1200  1400    
3  France    FR 2002      1400  1800
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        0      0
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000  

执行这项任务的有效方法是什么?

编辑:请注意,France 2004也是NA。扩展函数似乎确实可以正确处理这种情况。

编辑2:添加group_by(国家(似乎由于未知原因将事情搞砸:似乎代码正在尝试将character转换为numeric,尽管我不太明白为什么。当我将列转换为character时,错误变为计算错误。有什么建议吗?

> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(TRcomplete$F116.s)
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(as.character(TRcomplete$F116.s))
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s))
Error: unexpected ')' in "TRcomplete$F116.s <- as.character(TRcomplete$F116.s))"
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s)
> str(TRcomplete$F116.s)
chr [1:6984] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
Evaluation error: need at least two non-NA values to interpolate.

您可以将na.fillzoo库中的fill="extend"一起使用

rapply(df, zoo::na.fill,"integer",fill="extend",how="replace")
Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

编辑:

library(tidyverse)
library(zoo)
df%>%
group_by(Country)%>%
mutate_at(4:5,~na.fill(.x,"extend"))
Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

如果组中的所有元素都NA则:

df%>% 
group_by(Country)%>% 
mutate_if(is.numeric,~if(all(is.na(.x))) NA else na.fill(.x,"extend"))

最新更新