我有一个dataframe
,如下所示。我想合并'activity'
列中的重复项(除了名为'selection'
的重复项(,并在'duration'
列中求和它们的值。我在R中做这件事。我尝试过使用aggregate()
,但我找不到不聚合'selection'
行的方法。
# df - I used dput so you can have my df
test <- structure(list(activity = c("selection", "selection", "selection",
"other", "inspection", "assignment", "inspection", "inspection",
"inspection", "inspection"), workers = c("worker 1", "worker 1",
"worker 1", "worker 34", "worker 6", "worker 5", "worker 2",
"worker 2", "worker 2", "worker 2"), start_time = structure(c(1645396200,
1645396200, 1645396200, 1645394352, 1645394155, 1645394100, 1645390080,
1645476480, 1645562880, 1645649280), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), status = c("passed", "passed", "passed", "passed",
"passed", "passed", "passed", "passed", "passed", "passed"),
duration = c(8.98333333333333, 9.69027777777778, 9.20555555555556,
0.557222222222222, 2.24527777777778, 1.61666666666667, 2.12166666666667,
1.32638888888889, 2.59861111111111, 0.765555555555556)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
test
# A tibble: 10 x 5
activity workers start_time status duration
<chr> <chr> <dttm> <chr> <dbl>
1 selection worker 1 2022-02-20 22:30:00 passed 8.98
2 selection worker 1 2022-02-20 22:30:00 passed 9.69
3 selection worker 1 2022-02-20 22:30:00 passed 9.21
4 other worker 34 2022-02-20 21:59:12 passed 0.557
5 inspection worker 6 2022-02-20 21:55:55 passed 2.25
6 assignment worker 5 2022-02-20 21:55:00 passed 1.62
7 inspection worker 2 2022-02-20 20:48:00 passed 2.12
8 inspection worker 2 2022-02-21 20:48:00 passed 1.33
9 inspection worker 2 2022-02-22 20:48:00 passed 2.60
10 inspection worker 2 2022-02-23 20:48:00 passed 0.766
不一定能完全理解你在寻找什么,但我尝试一下!
因此,使用dplyr
库,您可以执行以下操作:
Reprex
- 代码
library(dplyr)
test %>%
filter(activity != "selection") %>%
group_by(activity) %>%
summarise(workers = workers[1],
start_time = start_time[1],
status = status[1],
duration = sum(duration)) %>%
bind_rows(test %>% filter(activity == "selection"))
- 输出
#> # A tibble: 6 x 5
#> activity workers start_time status duration
#> <chr> <chr> <dttm> <chr> <dbl>
#> 1 assignment worker 5 2022-02-20 21:55:00 passed 1.62
#> 2 inspection worker 6 2022-02-20 21:55:55 passed 9.06
#> 3 other worker 34 2022-02-20 21:59:12 passed 0.557
#> 4 selection worker 1 2022-02-20 22:30:00 passed 8.98
#> 5 selection worker 1 2022-02-20 22:30:00 passed 9.69
#> 6 selection worker 1 2022-02-20 22:30:00 passed 9.21
创建于2022-02-25由reprex包(v2.0.1(