r-将函数应用于组的组合,保持1组固定



我有一些数据,看起来像:

grp    date                id              Y
<chr>  <dttm>              <chr>       <dbl>
1 group1 2020-09-01 00:00:00 04003      17039.
2 group1 2020-09-01 00:00:00 04006      13233.
3 group1 2020-09-01 00:00:00 04011_AM    7918.
4 group1 2020-09-01 00:00:00 0401301_AD 22586.
5 group1 2020-09-01 00:00:00 0401303    20527.
6 group1 2020-09-01 00:00:00 0401305    29422.
7 group2 2020-09-01 00:00:00 22017_AM    7088.
8 group2 2020-09-01 00:00:00 22021_AM    8134.
9 group2 2020-09-01 00:00:00 22039_AM   15842.
10 group2 2020-09-01 00:00:00 22048      16142.

它有不同的组。我还有一个功能:

normaliseData <-function(m){
(m - min(m)) / (max(m) - min(m))
}

我想通过成对值的最小值和最大值来归一化组,保持group1固定。也就是说,我想对固定group1的数据进行规范化,使其具有以下组合。

  • group1&group2
  • group1&group3
  • CCD_ 7&group4

数据:

data <- structure(list(grp = c("group1", "group1", "group1", "group1", 
"group1", "group1", "group2", "group2", "group2", "group2", "group2", 
"group2", "group3", "group3", "group3", "group3", "group3", "group3", 
"group4", "group4", "group4", "group4", "group4", "group4"), 
date = structure(c(1598918400, 1598918400, 1598918400, 1598918400, 
1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
1598918400, 1598918400, 1598918400, 1598918400, 1598918400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), id = c("04003", 
"04006", "04011_AM", "0401301_AD", "0401303", "0401305", 
"22017_AM", "22021_AM", "22039_AM", "22048", "22053_AM", 
"22054_AM", "28002", "28004", "2800501", "2800502", "2800503", 
"2800504", "31010_AM", "31015_AM", "31016", "31019_AM", "31023", 
"31029_AM"), Y = c(17039.329, 13232.982, 7917.693, 22585.676, 
20527.113, 29422.471, 7087.536, 8134.265, 15842.035, 16142.111, 
11493.981, 6556.387, 22086.768, 11325.882, 53449.067, 83662.101, 
78508.089, 66107.125, 5095.169, 5590.531, 17796.439, 6028.701, 
39271.698, 3642.281)), row.names = c(NA, -24L), groups = structure(list(
grp = c("group1", "group2", "group3", "group4"), .rows = structure(list(
1:6, 7:12, 13:18, 19:24), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

编辑:

我希望申请以下内容:

#Min / max from group1 and group2
data %>% 
filter(grp == "group1" | grp == "group2") %>% 
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group3
data %>% 
filter(grp == "group1" | grp == "group3") %>% 
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group4
data %>% 
filter(grp == "group1" | grp == "group4") %>% 
mutate(
normedOut = normaliseData(Y)
)

根据我对您问题的理解,这里有一个purrr选项。我们创建了一个向量groups,它包含我们感兴趣的组,用于我们的三对保持group1固定的组。我们使用您想要的过滤器和突变序列,然后在包含标准化数据的groups向量中为每组创建列。这将产生一个包含3个新列的数据帧,每个列表示组1和另一组之间的归一化Y。NA将填充没有配对的地方(例如,在第2组和第3组之间(

groups <- c("group2", "group3", "group4")
groups %>%
purrr::map_dfr(~ data %>%
filter(grp == "group1" | grp == .x) %>%
mutate(!!.x := normaliseData(Y)))

最新更新