我有一个包含两列的数据框架:
names duration
1 J 97
2 G NA
3 H 53
4 A 23
5 E NA
6 D NA
7 C 73
8 F NA
9 B 37
10 I 67
我想做的是将duration列中的所有NA值替换为来自同一行的names列的值。我怎样才能做到呢?
数据
zz <- "names duration
1 J 97
2 G NA
3 H 53
4 A 23
5 E NA
6 D NA
7 C 73
8 F NA
9 B 37
10 I 67"
df <- read.table(text = zz, header = TRUE)
dplyr
library(dplyr)
df_new <- df %>%
mutate(duration = ifelse(is.na(duration), as.character(names), duration))
输出 df_new
# names duration
# 1 J 97
# 2 G G
# 3 H 53
# 4 A 23
# 5 E E
# 6 D D
# 7 C 73
# 8 F F
# 9 B 37
# 10 I 67
我们可以使用is.na
创建一个逻辑索引,然后基于'i1'对'names'进行子集,以替换同一行上的'duration'。
i1 <- is.na(df$duration)
df$duration[i1] <- df$names[i1]
df
# names duration
#1 J 97
#2 G G
#3 H 53
#4 A 23
#5 E E
#6 D D
#7 C 73
#8 F F
#9 B 37
#10 I 67
注意:这应该将'duration'的class
从numeric
更改为character
或者这可以用data.table
更快的方法来完成。将'data.frame'转换为'data.frame'。表' (setDT(df)
),将'duration'的class
更改为character
,然后通过指定'i'中的条件(is.na(duration)
),我们将'name'中与'i'条件对应的值赋给(:=
) 'duration'。当赋值发生在适当的地方,它将是非常有效的。
library(data.table)
setDT(df)[, duration:= as.character(duration)][is.na(duration), duration:= names]
数据df <- structure(list(names = c("J", "G", "H", "A", "E", "D", "C", "F",
"B", "I"), duration = c(97L, NA, 53L, 23L, NA, NA, 73L, NA, 37L,
67L)), .Names = c("names", "duration"), row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")