数据帧的头部如下:
ranges n
1 (0,1e+04] 13
2 (1e+04,2e+04] 11
3 (2e+04,3e+04] 21
4 (3e+04,4e+04] 14
5 (4e+04,5e+04] 9
6 (5e+04,6e+04] 8
7 (6e+04,7e+04] 13
8 (7e+04,8e+04] 11
9 (8e+04,9e+04] 16
10 (9e+04,1e+05] 16
显示特定数据的范围和每个组的计数数。
我想在两个现有列之间创建一个新列。它应该由代表每个组范围的中间点的值组成(即,第一组为5000,第二组为15,000,依此类推)。
到目前为止,我已经成功地使用命令添加了一个新列:add_column(Position = "Value",
.after="ranges")
结果如下:
ranges Position n
1 (0,1e+04] Value 13
2 (1e+04,2e+04] Value 11
3 (2e+04,3e+04] Value 21
4 (3e+04,4e+04] Value 14
5 (4e+04,5e+04] Value 9
6 (5e+04,6e+04] Value 8
7 (6e+04,7e+04] Value 13
8 (7e+04,8e+04] Value 11
9 (8e+04,9e+04] Value 16
10 (9e+04,1e+05] Value 16
我仍然不确定如何添加每个组的范围的中间值,而不仅仅是" value ">
有什么建议吗?
mydf <- data.frame(
ranges =
c("(0,1e+04]",
"(1e+04,2e+04]",
"(2e+04,3e+04]",
"(3e+04,4e+04]",
"(4e+04,5e+04]",
"(5e+04,6e+04]",
"(6e+04,7e+04]",
"(7e+04,8e+04]",
"(8e+04,9e+04]",
"(9e+04,1e+05]")
)
library(dplyr)
library(stringr)
library(tidyr)
library(readr)
mydf |>
separate(ranges, into = c("min", "max"), sep = ",") |>
mutate(min = str_remove(min, "\("),
max = str_remove(max, "\]"),
across(c(min, max), parse_number)) |>
rowwise() |>
mutate(value = mean(c_across(c(min, max)))) |>
ungroup()
# A tibble: 10 × 3
min max value
<dbl> <dbl> <dbl>
1 0 10000 5000
2 10000 20000 15000
3 20000 30000 25000
4 30000 40000 35000
5 40000 50000 45000
6 50000 60000 55000
7 60000 70000 65000
8 70000 80000 75000
9 80000 90000 85000
10 90000 100000 95000