从系数和变量的R数据帧创建公式



我正在尝试创建一个函数formulator,用响应、系数、常数和函数名的数据帧创建R公式。我的意图是在将大量历史函数转换为可用的R代码时使用它。将每个函数重写为(响应~常数+b1 x x1+b2 x x2…..(是乏味且容易出错的

具有相同变量的示例数据帧,但并非每个变量都对每个情况感兴趣(例如未使用时的NA(。每个函数都有它自己的行,每个部分都有自己的列,其中列名是变量,单元格是系数。并非所有系数都是正的。

structure(list(species = c("Pine", "Spruce", "Birch", "Aspen", 
"Beech", "Oak", "Noble", "Trivial"), constant = c(-1.6952, -2.2827, 
-0.2269, -0.8198, 0.2081, 0.2348, 0.485, 1.9814), lndp1 = c(1.1617, 
1.4354, 1.1891, 1.4839, 1.7491, 1.2141, 1.0318, 0.8401), d = c(-0.0354, 
-0.0389, -0.0435, -0.024, -0.2167, NA, NA, NA), d2gt = c(0.2791, 
0.3106, 0.562, NA, NA, NA, NA, NA)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

我的想法是,既然它的顺序很整齐,我可以为我写一个函数来完成这项工作,并用如下打印输出回复:

data %>% formulator(name_column=species, intercept_column=constant, response="Unknown")

在这种情况下,没有已知的响应变量列,但我可能知道该数据帧中的所有行都有相同的响应,这对于手动键入引号可能很有用(tidyval issue?(。

Pine
Unknown ~ -1.6952 + 1.1617 x lndp1 + -0.0354 x d ....
Spruce
Unknown ~ ...

到目前为止,我的想法是:

formulator <- function(data, name_column, intercept_column){
data1 <- data %>% select(-c(name_column, intercept_column))
function_name <- data[,paste0(name_column)]
intercepts <- data[,paste0(intercept_column)]
varlist <- list()
for(i in 1:dim(data1)[1]){
data2 <- data1 %>% filter(name_column == paste0(function_name$i)) %>%  select_if(~!any(is.na(.)))
datadim <- dim(data2)[2]
for(coefs in 1:datadim){
varlist[paste0(function_name$i)][coefs] <- paste0(data2[1,coefs])
}
}

}

这段代码是不完整的,但我认为可以处理每个函数的不同长度来打印,但我不确定如何将所有这些联系在一起。

我可能建议创建公式的文本版本,存储为命名向量,然后在需要公式时使用as.formula(textVersion["foo"])。这里有一些代码可以给你这个想法。。。

library(tibble)
library(dplyr)
formulaData = tibble(
species = c("Pine", "Spruce", "Birch", "Aspen", "Beech", "Oak", "Noble", "Trivial"), 
constant = c(-1.6952, -2.2827, -0.2269, -0.8198, 0.2081, 0.2348, 0.485, 1.9814), 
lndp1 = c(1.1617, 1.4354, 1.1891, 1.4839, 1.7491, 1.2141, 1.0318, 0.8401), 
d = c(-0.0354, -0.0389, -0.0435, -0.024, -0.2167, NA, NA, NA),
d2gt = c(0.2791, 0.3106, 0.562, NA, NA, NA, NA, NA)
)
rhs = 
formulaData %>%
select(!constant) %>%
group_by(species) %>%
group_map(
function(x,y) 
x[,!is.na(as.numeric(x))] %>%
unlist %>%
paste(names(.), sep = "*", collapse = " + ")
) %>%
unlist %>%
paste(" + ", formulaData$constant)
textVersion = 
paste("x ~", rhs) %>%
structure(names = sort(formulaData$species))

示例结果:

> textVersion
Aspen 
"x ~ 1.4839*lndp1 + -0.024*d  +  -1.6952" 
Beech 
"x ~ 1.7491*lndp1 + -0.2167*d  +  -2.2827" 
Birch 
"x ~ 1.1891*lndp1 + -0.0435*d + 0.562*d2gt  +  -0.2269" 
Noble 
"x ~ 1.0318*lndp1  +  -0.8198" 
Oak 
"x ~ 1.2141*lndp1  +  0.2081" 
Pine 
"x ~ 1.1617*lndp1 + -0.0354*d + 0.2791*d2gt  +  0.2348" 
Spruce 
"x ~ 1.4354*lndp1 + -0.0389*d + 0.3106*d2gt  +  0.485" 
Trivial 
"x ~ 0.8401*lndp1  +  1.9814" 

> as.formula(textVersion["Oak"])
x ~ 1.2141 * lndp1 + 0.2081

如果你真的想要一个返回公式的formulator函数,我会转置你的tibble:

transposedData = 
formulaData %>%
select(!species) %>%
unlist %>%
matrix(ncol = 4, dimnames = list(formulaData$species, names(formulaData)[-1])) %>%
t %>%
as_tibble %>%
mutate(term = names(formulaData)[-1]) %>%
relocate(term, before = Pine)

看起来像这样:

> transposedData
# A tibble: 4 x 9
term        Pine  Spruce   Birch  Aspen  Beech    Oak  Noble Trivial
<chr>      <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
1 constant -1.70   -2.28   -0.227  -0.820  0.208  0.235  0.485   1.98 
2 lndp1     1.16    1.44    1.19    1.48   1.75   1.21   1.03    0.840
3 d        -0.0354 -0.0389 -0.0435 -0.024 -0.217 NA     NA      NA    
4 d2gt      0.279   0.311   0.562  NA     NA     NA     NA      NA    

那么函数就相当简单了。类似于:

formulator = function(.data, ID, lhs, constant = "constant") {
terms = structure(
paste(.data[[ID]], .data$term, sep = "*"),
names = .data$term
)
terms = terms[!is.na(.data[[ID]])]
cnst = which(names(terms) == constant)
terms[cnst] = .data[[ID]][cnst]
rhs = paste(terms, collapse = " + ")
textVersion = paste(lhs, "~", rhs)
as.formula(textVersion, env = parent.frame())
}

下面是一个应用程序示例:

> formulator(transposedData, "Beech", "myVariable")
myVariable ~ 0.2081 + 1.7491 * lndp1 + -0.2167 * d

我不确定我是否完全理解你的问题,也不确定我编写的函数是否是你想要的,但有一些编码示例可能会帮助你设计解决方案。

最新更新