德雷克 R 中多个子计划的最佳实践

嗨，我是drakeR 包的新手，想听听关于使用子任务管理大型项目的最佳实践的一些意见。我的项目的简化结构包括两部分：1(数据清理和2(建模。它们是级联的，因为我首先进行数据清理，然后在开始建模部分时很少返回。

我认为手册建议的方法是：

source("functions_1.R") # for plan_1
plan1 <- drake_plan(
# many middle steps to create
foo = some_function()
foo_1 = fn_1(foo)
foo_2 = fn_2(foo_1)
for_analysis = data_cleaning_fn()
)
plan2 <- drake_plan(
# I would like to use the target name foo_1 again, but not the same object as they were defined in plan1. 
# What I want:
# foo_1 = fn_new_1(for_analysis) # this is different from above defined
# result = model_fn(for_1)
# What I actually did
foo_new_1 = fn_new_1(for_analysis) # I have to define a new name different from foo_1
result = model_fn(foo_new_1)
)
fullplan <- bind_plans(plan1,plan2)
make(fullplan)

我在上述工作流程中遇到的一个问题是，我为plan1定义了很多中间目标，但它们在plan2中毫无用处。

有没有办法让我在plan2中拥有一个"干净的命名空间"，以便我可以摆脱无用的名称foo_1和foo_2等？这样我就可以在plan2中重复使用这些名称.我只想保留plan_2的是for_analysis.
有没有办法将functions_1.R中定义的函数仅用于plan1，而functions_2.R中定义的函数仅用于plan2？我想每次都使用一组较小的函数。

多谢！

有趣的问题。drake不支持计划中有多个命名空间。所有目标名称都必须唯一，所有函数名称都必须唯一，因此如果要重用名称，则需要将这些计划完全放在单独的项目中。

您可能会遇到定义太多目标的情况。从广义上讲，目标应该 (1( 为您的项目生成有意义的输出，或者 (2( 消耗足够的运行时，以便跳过它们可以节省您的时间。我建议阅读 https://books.ropensci.org/drake/plans.html#how-to-choose-good-targets。要将多个目标压缩为一个目标，我建议将函数组合在一起。例：

foo_all <- 函数(( # 每个中间步骤都非常快，但放在一起，它们占用了明显的运行时。 foo <- some_function(( foo_1 <- fn_1(foo( foo_2 <- fn_2(foo_1( for_analysis = data_cleaning_fn(( ) 计划1 <- drake_plan( for_analysis = foo_all(( 此外

，drake的分支机制是自动生成名称或避免过于思考名称的便捷方式。也许看看 https://books.ropensci.org/drake/static.html 和 https://books.ropensci.org/drake/dynamic.html。

相关内容

最新更新

热门标签：