我试图为每一行插入NA值,但只有当相邻有两个或更少的NA值时,我才想插入NA值。例如,在第3行中,有三个相邻的NA,所以我不想插值,但在第1行和第2行中,相邻的NA有两个或更少,所以我的目标是线性插值。有没有有效的方法来处理它?
我有一个看起来像这样的数据集:
df1:
ID string1 2018 2019 2020 2021 2022 string2
1: a1 x2 3 3 NA 4 4 si
2: a2 g3 5 5 NA NA 1 q2
3: a3 n2 11 NA NA NA 3 oq
4: a4 m3 3 NA 9 8 8 mx
5: a5 2w 9 1 NA 5 NA ix
6: a6 ps2 2 NA 7 4 4 p2
7: a7 kg2 6 NA NA NA 6 2q
再现性:
df1 = data.table(
ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
"string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
"2018" = c(3,5,11,3,9,2,6),
"2019" = c(3,5,NA,NA,1,NA,NA),
"2020" = c(NA,NA,NA,9,NA,7,NA),
"2021" = c(4,NA,NA,8,5,4,NA),
"2022" = c(4,1,3,8,NA,4,6),
"string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))
我试着得到一个data.table
,它看起来像:
ID string1 2018 2019 2020 2021 2022 string2
1: a1 x2 3 3.00 3.5 4 4 si
2: a2 g3 5 5.00 4.3 3 1 q2
3: a3 n2 11 NA NA NA 3 oq
4: a4 m3 3 8.25 9.0 8 8 mx
5: a5 2w 9 1.00 -0.3 5 17 ix
6: a6 ps2 2 8.00 7.0 4 4 p2
7: a7 kg2 6 NA NA NA 6 2q
谢谢你的建议!
请使用data.table
和imputeTS
库找到解决方案(参见下面的reprex(。
Reprex
- 代码
library(data.table)
library(imputeTS)
results <- df1 %>%
transpose(., keep.names = 'rn') %>%
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na_interpolation, "spline", 2)]} %>%
round(., 2) %>%
transpose(., make.names = 'rn') %>%
cbind(.,df1[,c("ID", "string1", "string2")]) %>%
setcolorder(., names(df1))
- 输出
results
#> ID string1 2018 2019 2020 2021 2022 string2
#> <char> <char> <num> <num> <num> <num> <num> <char>
#> 1: a1 x2 3 3.00 3.50 4 4 si
#> 2: a2 g3 5 5.00 4.33 3 1 q2
#> 3: a3 n2 11 NA NA NA 3 oq
#> 4: a4 m3 3 8.25 9.00 8 8 mx
#> 5: a5 2w 9 1.00 -0.50 5 5 ix
#> 6: a6 ps2 2 8.00 7.00 4 4 p2
#> 7: a7 kg2 6 NA NA NA 6 2q
创建于2021-12-02由reprex包(v2.0.1(
使用data.table
和zoo
库可能是更好的解决方案(参见下面的reprex(。这个解决方案给出了你想要的结果(即忘记我在你的问题下的评论!(
Reprex
- 代码
library(data.table)
library(zoo)
library(magrittr) # for the pipes!
results <- df1 %>%
transpose(., keep.names = 'rn') %>%
{.[3:nrow(df1), lapply(.SD, as.numeric),
][, lapply(.SD, na.spline, maxgap = 2)]} %>%
round(., 2) %>%
transpose(., make.names = 'rn') %>%
cbind(.,df1[,c("ID", "string1", "string2")]) %>%
setcolorder(., names(df1))
- 输出
results
#> ID string1 2018 2019 2020 2021 2022 string2
#> <char> <char> <num> <num> <num> <num> <num> <char>
#> 1: a1 x2 3 3.00 3.50 4 4 si
#> 2: a2 g3 5 5.00 4.33 3 1 q2
#> 3: a3 n2 11 NA NA NA 3 oq
#> 4: a4 m3 3 8.25 9.00 8 8 mx
#> 5: a5 2w 9 1.00 -0.33 5 17 ix
#> 6: a6 ps2 2 8.00 7.00 4 4 p2
#> 7: a7 kg2 6 NA NA NA 6 2q
创建于2021-12-03由reprex包(v2.0.1(