我正在尝试创建一个函数formulator
,用响应、系数、常数和函数名的数据帧创建R公式。我的意图是在将大量历史函数转换为可用的R代码时使用它。将每个函数重写为(响应~常数+b1 x x1+b2 x x2…..(是乏味且容易出错的
具有相同变量的示例数据帧,但并非每个变量都对每个情况感兴趣(例如未使用时的NA(。每个函数都有它自己的行,每个部分都有自己的列,其中列名是变量,单元格是系数。并非所有系数都是正的。
structure(list(species = c("Pine", "Spruce", "Birch", "Aspen",
"Beech", "Oak", "Noble", "Trivial"), constant = c(-1.6952, -2.2827,
-0.2269, -0.8198, 0.2081, 0.2348, 0.485, 1.9814), lndp1 = c(1.1617,
1.4354, 1.1891, 1.4839, 1.7491, 1.2141, 1.0318, 0.8401), d = c(-0.0354,
-0.0389, -0.0435, -0.024, -0.2167, NA, NA, NA), d2gt = c(0.2791,
0.3106, 0.562, NA, NA, NA, NA, NA)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
我的想法是,既然它的顺序很整齐,我可以为我写一个函数来完成这项工作,并用如下打印输出回复:
data %>% formulator(name_column=species, intercept_column=constant, response="Unknown")
在这种情况下,没有已知的响应变量列,但我可能知道该数据帧中的所有行都有相同的响应,这对于手动键入引号可能很有用(tidyval issue?(。
Pine
Unknown ~ -1.6952 + 1.1617 x lndp1 + -0.0354 x d ....
Spruce
Unknown ~ ...
到目前为止,我的想法是:
formulator <- function(data, name_column, intercept_column){
data1 <- data %>% select(-c(name_column, intercept_column))
function_name <- data[,paste0(name_column)]
intercepts <- data[,paste0(intercept_column)]
varlist <- list()
for(i in 1:dim(data1)[1]){
data2 <- data1 %>% filter(name_column == paste0(function_name$i)) %>% select_if(~!any(is.na(.)))
datadim <- dim(data2)[2]
for(coefs in 1:datadim){
varlist[paste0(function_name$i)][coefs] <- paste0(data2[1,coefs])
}
}
}
这段代码是不完整的,但我认为可以处理每个函数的不同长度来打印,但我不确定如何将所有这些联系在一起。
我可能建议创建公式的文本版本,存储为命名向量,然后在需要公式时使用as.formula(textVersion["foo"])
。这里有一些代码可以给你这个想法。。。
library(tibble)
library(dplyr)
formulaData = tibble(
species = c("Pine", "Spruce", "Birch", "Aspen", "Beech", "Oak", "Noble", "Trivial"),
constant = c(-1.6952, -2.2827, -0.2269, -0.8198, 0.2081, 0.2348, 0.485, 1.9814),
lndp1 = c(1.1617, 1.4354, 1.1891, 1.4839, 1.7491, 1.2141, 1.0318, 0.8401),
d = c(-0.0354, -0.0389, -0.0435, -0.024, -0.2167, NA, NA, NA),
d2gt = c(0.2791, 0.3106, 0.562, NA, NA, NA, NA, NA)
)
rhs =
formulaData %>%
select(!constant) %>%
group_by(species) %>%
group_map(
function(x,y)
x[,!is.na(as.numeric(x))] %>%
unlist %>%
paste(names(.), sep = "*", collapse = " + ")
) %>%
unlist %>%
paste(" + ", formulaData$constant)
textVersion =
paste("x ~", rhs) %>%
structure(names = sort(formulaData$species))
示例结果:
> textVersion
Aspen
"x ~ 1.4839*lndp1 + -0.024*d + -1.6952"
Beech
"x ~ 1.7491*lndp1 + -0.2167*d + -2.2827"
Birch
"x ~ 1.1891*lndp1 + -0.0435*d + 0.562*d2gt + -0.2269"
Noble
"x ~ 1.0318*lndp1 + -0.8198"
Oak
"x ~ 1.2141*lndp1 + 0.2081"
Pine
"x ~ 1.1617*lndp1 + -0.0354*d + 0.2791*d2gt + 0.2348"
Spruce
"x ~ 1.4354*lndp1 + -0.0389*d + 0.3106*d2gt + 0.485"
Trivial
"x ~ 0.8401*lndp1 + 1.9814"
和
> as.formula(textVersion["Oak"])
x ~ 1.2141 * lndp1 + 0.2081
如果你真的想要一个返回公式的formulator
函数,我会转置你的tibble:
transposedData =
formulaData %>%
select(!species) %>%
unlist %>%
matrix(ncol = 4, dimnames = list(formulaData$species, names(formulaData)[-1])) %>%
t %>%
as_tibble %>%
mutate(term = names(formulaData)[-1]) %>%
relocate(term, before = Pine)
看起来像这样:
> transposedData
# A tibble: 4 x 9
term Pine Spruce Birch Aspen Beech Oak Noble Trivial
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 constant -1.70 -2.28 -0.227 -0.820 0.208 0.235 0.485 1.98
2 lndp1 1.16 1.44 1.19 1.48 1.75 1.21 1.03 0.840
3 d -0.0354 -0.0389 -0.0435 -0.024 -0.217 NA NA NA
4 d2gt 0.279 0.311 0.562 NA NA NA NA NA
那么函数就相当简单了。类似于:
formulator = function(.data, ID, lhs, constant = "constant") {
terms = structure(
paste(.data[[ID]], .data$term, sep = "*"),
names = .data$term
)
terms = terms[!is.na(.data[[ID]])]
cnst = which(names(terms) == constant)
terms[cnst] = .data[[ID]][cnst]
rhs = paste(terms, collapse = " + ")
textVersion = paste(lhs, "~", rhs)
as.formula(textVersion, env = parent.frame())
}
下面是一个应用程序示例:
> formulator(transposedData, "Beech", "myVariable")
myVariable ~ 0.2081 + 1.7491 * lndp1 + -0.2167 * d
我不确定我是否完全理解你的问题,也不确定我编写的函数是否是你想要的,但有一些编码示例可能会帮助你设计解决方案。