我有一个如下所示的数据帧:
id V2 V3 V4 V5
1 1 0.0000 1.0000 2.000 3.0000
2 2 NA 0.0000 0.000 NA
3 3 0.0000 0.0000 NA NA
4 4 125.0605 120.8402 125.095 124.8971
5 5 0.0000 0.0000 NA 163.4609
我想创建一个如下所示的参差不齐的数组样式数据帧,其中一列 w 表示数字(跳过 NA(,另一列指示该数字来自哪个 id(id 也对应于每一行(:
w ind
0.0000 1
1.0000 1
2.000 1
3.0000 1
0.0000 2
0.000 2
0.0000 3
0.0000 3
125.0605 4
120.8402 4
125.095 4
124.8971 4
0.0000 5
0.0000 5
163.4609 5
df <- structure(list(id = structure(1:5, .Label = c(1, 2, 3,
4, 5)), V2 = c(0, NA, 0, 125.0605, 0),
V3 = c(1, 0, 0, 120.8402, 0), V4 = c(2, 0, NA, 125.095, NA
), V5 = c(3, NA, NA, 124.8971, 163.4609)), class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
你可以试试stack
,
na.omit(stack(data.frame(t(df)[-1,], stringsAsFactors = FALSE)))
# values ind
#1 0.0000 X1
#2 1.0000 X1
#3 2.000 X1
#4 3.0000 X1
#6 0.0000 X2
#7 0.000 X2
#9 0.0000 X3
#10 0.0000 X3
#13 125.0605 X4
#14 120.8402 X4
#15 125.095 X4
#16 124.8971 X4
#17 0.0000 X5
#18 0.0000 X5
#20 163.4609 X5
我们可以t
删除数据集的列(第一列除外(,同时复制第一列以创建一个data.frame
并删除NA
行,并带有na.omit
na.omit(data.frame(w =c(t(df[-1])), ind = rep(df$id, each = ncol(df)-1)))
# w ind
#1 0.0000 1
#2 1.0000 1
#3 2.0000 1
#4 3.0000 1
#6 0.0000 2
#7 0.0000 2
#9 0.0000 3
#10 0.0000 3
#13 125.0605 4
#14 120.8402 4
#15 125.0950 4
#16 124.8971 4
#17 0.0000 5
#18 0.0000 5
#20 163.4609 5
或者来自tidyr
的pivot_longer
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -id, values_to = 'w') %>%
filter(!is.na(w)) %>%
select(w, ind = id)
# A tibble: 15 x 2
# w ind
# <dbl> <int>
# 1 0 1
# 2 1 1
# 3 2 1
# 4 3 1
# 5 0 2
# 6 0 2
# 7 0 3
# 8 0 3
# 9 125. 4
#10 121. 4
#11 125. 4
#12 125. 4
#13 0 5
#14 0 5
#15 163. 5