r-合并重复项,但不是全部

  • 本文关键字:全部 合并 r dataframe
  • 更新时间 :
  • 英文 :


我有一个dataframe,如下所示。我想合并'activity'列中的重复项(除了名为'selection'的重复项(,并在'duration'列中求和它们的值。我在R中做这件事。我尝试过使用aggregate(),但我找不到不聚合'selection'行的方法。

# df - I used dput so you can have my df
test <- structure(list(activity = c("selection", "selection", "selection", 
"other", "inspection", "assignment", "inspection", "inspection", 
"inspection", "inspection"), workers = c("worker 1", "worker 1", 
"worker 1", "worker 34", "worker 6", "worker 5", "worker 2", 
"worker 2", "worker 2", "worker 2"), start_time = structure(c(1645396200, 
1645396200, 1645396200, 1645394352, 1645394155, 1645394100, 1645390080, 
1645476480, 1645562880, 1645649280), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), status = c("passed", "passed", "passed", "passed", 
"passed", "passed", "passed", "passed", "passed", "passed"), 
duration = c(8.98333333333333, 9.69027777777778, 9.20555555555556, 
0.557222222222222, 2.24527777777778, 1.61666666666667, 2.12166666666667, 
1.32638888888889, 2.59861111111111, 0.765555555555556)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

test 
# A tibble: 10 x 5
activity   workers   start_time          status duration
<chr>      <chr>     <dttm>              <chr>     <dbl>
1 selection  worker 1  2022-02-20 22:30:00 passed    8.98 
2 selection  worker 1  2022-02-20 22:30:00 passed    9.69 
3 selection  worker 1  2022-02-20 22:30:00 passed    9.21 
4 other      worker 34 2022-02-20 21:59:12 passed    0.557
5 inspection worker 6  2022-02-20 21:55:55 passed    2.25 
6 assignment worker 5  2022-02-20 21:55:00 passed    1.62 
7 inspection worker 2  2022-02-20 20:48:00 passed    2.12 
8 inspection worker 2  2022-02-21 20:48:00 passed    1.33 
9 inspection worker 2  2022-02-22 20:48:00 passed    2.60 
10 inspection worker 2  2022-02-23 20:48:00 passed    0.766

不一定能完全理解你在寻找什么,但我尝试一下!

因此,使用dplyr库,您可以执行以下操作:

Reprex

  • 代码
library(dplyr)
test %>% 
filter(activity != "selection") %>% 
group_by(activity) %>% 
summarise(workers = workers[1],
start_time = start_time[1],
status = status[1],
duration = sum(duration)) %>% 
bind_rows(test %>% filter(activity == "selection"))
  • 输出
#> # A tibble: 6 x 5
#>   activity   workers   start_time          status duration
#>   <chr>      <chr>     <dttm>              <chr>     <dbl>
#> 1 assignment worker 5  2022-02-20 21:55:00 passed    1.62 
#> 2 inspection worker 6  2022-02-20 21:55:55 passed    9.06 
#> 3 other      worker 34 2022-02-20 21:59:12 passed    0.557
#> 4 selection  worker 1  2022-02-20 22:30:00 passed    8.98 
#> 5 selection  worker 1  2022-02-20 22:30:00 passed    9.69 
#> 6 selection  worker 1  2022-02-20 22:30:00 passed    9.21

创建于2022-02-25由reprex包(v2.0.1(

相关内容

最新更新