r-用每行的平均值对NA值进行插值,但仅针对数值之间的一个或两个NA值



我试图为每一行插入NA值,但只有当相邻有两个或更少的NA值时,我才想插入NA值。例如,在第3行中,有三个相邻的NA,所以我不想插值,但在第1行和第2行中,相邻的NA有两个或更少,所以我的目标是线性插值。有没有有效的方法来处理它?

我有一个看起来像这样的数据集:

df1:
ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3    3   NA    4    4      si
2: a2      g3    5    5   NA   NA    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3   NA    9    8    8      mx
5: a5      2w    9    1   NA    5   NA      ix
6: a6     ps2    2   NA    7    4    4      p2
7: a7     kg2    6   NA   NA   NA    6      2q

再现性:

df1 = data.table(
ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
"string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
"2018" = c(3,5,11,3,9,2,6),
"2019" = c(3,5,NA,NA,1,NA,NA),
"2020" = c(NA,NA,NA,9,NA,7,NA),
"2021" = c(4,NA,NA,8,5,4,NA),
"2022" = c(4,1,3,8,NA,4,6),
"string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))

我试着得到一个data.table,它看起来像:

ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3 3.00  3.5    4    4      si
2: a2      g3    5 5.00  4.3    3    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3 8.25  9.0    8    8      mx
5: a5      2w    9 1.00 -0.3    5   17      ix
6: a6     ps2    2 8.00  7.0    4    4      p2
7: a7     kg2    6   NA  NA    NA    6      2q

谢谢你的建议!

请使用data.tableimputeTS库找到解决方案(参见下面的reprex(。

Reprex

  • 代码
library(data.table)
library(imputeTS)
results <- df1 %>% 
transpose(., keep.names = 'rn') %>% 
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na_interpolation, "spline", 2)]} %>% 
round(., 2) %>%  
transpose(., make.names = 'rn') %>% 
cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
setcolorder(., names(df1))
  • 输出
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.50     5     5      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

创建于2021-12-02由reprex包(v2.0.1(


使用data.tablezoo库可能是更好的解决方案(参见下面的reprex(。这个解决方案给出了你想要的结果(即忘记我在你的问题下的评论!(

Reprex

  • 代码
library(data.table)
library(zoo)
library(magrittr) # for the pipes! 
results <- df1 %>% 
transpose(., keep.names = 'rn') %>% 
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na.spline, maxgap = 2)]} %>% 
round(., 2) %>%  
transpose(., make.names = 'rn') %>% 
cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
setcolorder(., names(df1))
  • 输出
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.33     5    17      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

创建于2021-12-03由reprex包(v2.0.1(

最新更新