例如,假设有一个数据集看起来像这个
编辑:为额外上下文添加日期和编号列
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y 5
1 10-10 Y Y 5
1 10-10 Y 5
2 09-17 Y 6
2 09-17 Y 6
3 12-14 Y 7
3 12-14 Y 7
4 06-06 Y 8
4 06-06
有没有一种方法可以得到这样的输出?
ID|Date |Col1|Col2|Col3|Num
1 10-10 Y Y Y 5
2 09-17 Y Y 6
3 12-14 Y Y 7
4 06-06 Y 8
structure(list(ID = c("000001", "000001", "000001", "000001",
"000001", "000001", "000001", "000001", "000001", "000001", "000002",
"000002", "000002", "000003", "000003", "000003", "000003", "000003",
"000003", "000003"), Date = structure(c(1570492800, 1570492800,
1570492800, 1570492800, 1570492800, 1570492800, 1570492800, 1570492800,
1570492800, 1570492800, 1570665600, 1570665600, 1570665600, 1570838400,
1570838400, 1570838400, 1570838400, 1570838400, 1570838400, 1570838400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), Col1 = c(NA,
NA, NA, NA, "Y", NA, NA, "Y", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA), Col2 = c("Y", "Y", "Y", "Y", "Y", "Y", "Y",
"Y", "Y", "Y", "Y", "Y", "Y", NA, NA, NA, NA, "Y", "Y", "Y"),
Col3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, "Y", NA, NA, "Y", "Y", "Y"), Num1 = c(1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 861, 861, 861, 497, 497, 497, 497, 497,
497, 497)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
我们可以通过循环across
来按"ID"、"Date"和summarise
对其余列(everything()
(进行分组,并提取第一个非NA元素
library(dplyr)
df2 %>%
group_by(ID, Date) %>%
summarise(across(everything(), ~.x[!is.na(.x)][1]),
.groups = 'drop')
-输出
# A tibble: 3 × 6
ID Date Col1 Col2 Col3 Num1
<chr> <dttm> <chr> <chr> <chr> <dbl>
1 000001 2019-10-08 00:00:00 Y Y <NA> 1
2 000002 2019-10-10 00:00:00 <NA> Y <NA> 861
3 000003 2019-10-12 00:00:00 <NA> Y Y 497
带有aggregate
的基本R选项
ID Date Col1 Col2 Col3 Num1
1 000001 2019-10-08 Y Y 1
2 000002 2019-10-10 Y 861
3 000003 2019-10-12 Y Y 497