我有一个两列三列的数据集。第三列是日期值和一些字符串。
ID Col1 Value
123 Start.Date 2011-06-18
123 Stem A1
123 Stem_1 A6
123 Stem_2 NA
321 Start.Date 2014-08-05
321 Stem C1
321 Stem_1 C4
321 Stem_2 NA
677 Start.Date NA
677 Stem NA
677 Stem_1 NA
677 Stem_2 NA
如何将日期分开并像这样将它们存储在不同的列中?
ID Col1 Value Start.Date
123 Stem A1 2011-06-18
123 Stem_1 A6 2011-06-18
123 Stem_2 NA 2011-06-18
321 Stem C1 2014-08-05
321 Stem_1 C4 2014-08-05
321 Stem_2 NA 2014-08-05
677 Stem NA NA
677 Stem_1 NA NA
677 Stem_2 NA NA
谢谢。
仅基于tidyr
的替代解决方案:
df %>% pivot_wider(ID, names_from = Col1, values_from = Value ) %>%
pivot_longer(c("Stem", "Stem_1", "Stem_2"), names_to = "Col1", values_to = "Value")
在Value
列中有Col1 = 'Start.Date'
或NA
值的数据中新建一列。对于每个ID
,我们可以填充以前日期的NA
值,并删除'Start.Date'
行。
library(dplyr)
library(tidyr)
df %>%
mutate(Start.Date = as.Date(replace(Value, Col1 != 'Start.Date', NA))) %>%
group_by(ID) %>%
fill(Start.Date) %>%
ungroup() %>%
filter(Col1 != 'Start.Date')
# ID Col1 Value Start.Date
# <int> <chr> <chr> <date>
#1 123 Stem A1 2011-06-18
#2 123 Stem_1 A6 2011-06-18
#3 123 Stem_2 NA 2011-06-18
#4 321 Stem C1 2014-08-05
#5 321 Stem_1 C4 2014-08-05
#6 321 Stem_2 NA 2014-08-05
#7 677 Stem NA NA
#8 677 Stem_1 NA NA
#9 677 Stem_2 NA NA
df <- structure(list(ID = c(123L, 123L, 123L, 123L, 321L, 321L, 321L,
321L, 677L, 677L, 677L, 677L), Col1 = c("Start.Date", "Stem",
"Stem_1", "Stem_2", "Start.Date", "Stem", "Stem_1", "Stem_2",
"Start.Date", "Stem", "Stem_1", "Stem_2"), Value = c("2011-06-18",
"A1", "A6", NA, "2014-08-05", "C1", "C4", NA, NA, NA, NA, NA)),
class = "data.frame", row.names = c(NA, -12L))