r语言 - 从数据框架中的现有组范围创建新列的函数



数据帧的头部如下:

ranges  n
1             (0,1e+04] 13
2         (1e+04,2e+04] 11
3         (2e+04,3e+04] 21
4         (3e+04,4e+04] 14
5         (4e+04,5e+04]  9
6         (5e+04,6e+04]  8
7         (6e+04,7e+04] 13
8         (7e+04,8e+04] 11
9         (8e+04,9e+04] 16
10        (9e+04,1e+05] 16

显示特定数据的范围和每个组的计数数。

我想在两个现有列之间创建一个新列。它应该由代表每个组范围的中间点的值组成(即,第一组为5000,第二组为15,000,依此类推)。

到目前为止,我已经成功地使用命令添加了一个新列:
add_column(Position = "Value",
.after="ranges")

结果如下:

ranges Position  n
1             (0,1e+04]    Value 13
2         (1e+04,2e+04]    Value 11
3         (2e+04,3e+04]    Value 21
4         (3e+04,4e+04]    Value 14
5         (4e+04,5e+04]    Value  9
6         (5e+04,6e+04]    Value  8
7         (6e+04,7e+04]    Value 13
8         (7e+04,8e+04]    Value 11
9         (8e+04,9e+04]    Value 16
10        (9e+04,1e+05]    Value 16

我仍然不确定如何添加每个组的范围的中间值,而不仅仅是" value ">

有什么建议吗?

mydf <- data.frame(
ranges  =
c("(0,1e+04]",
"(1e+04,2e+04]",
"(2e+04,3e+04]",
"(3e+04,4e+04]",
"(4e+04,5e+04]",
"(5e+04,6e+04]",
"(6e+04,7e+04]",
"(7e+04,8e+04]",
"(8e+04,9e+04]",
"(9e+04,1e+05]")
)
library(dplyr)
library(stringr)
library(tidyr)
library(readr)
mydf |> 
separate(ranges, into = c("min", "max"), sep = ",") |> 
mutate(min = str_remove(min, "\("),
max = str_remove(max, "\]"),
across(c(min, max), parse_number)) |> 
rowwise() |> 
mutate(value = mean(c_across(c(min, max)))) |> 
ungroup()
# A tibble: 10 × 3
min    max value
<dbl>  <dbl> <dbl>
1     0  10000  5000
2 10000  20000 15000
3 20000  30000 25000
4 30000  40000 35000
5 40000  50000 45000
6 50000  60000 55000
7 60000  70000 65000
8 70000  80000 75000
9 80000  90000 85000
10 90000 100000 95000

最新更新