我需要使用存储在命名元组中的一些数据,通过对它们应用函数,来创建具有与元组元素相同的列号和名称的数据帧。例如:
a = (A = [1, 2], B = 1:6)
f(a) = begin
df = DataFrame()
for k in keys(a) df[k] = sample(a[k], 10) end # There could be any other function in place of sample()
df
end
但如果我运行@code_warntype,我会得到Union类型,我知道这意味着编译器无法在运行前预测类型,这会影响性能:
julia> @code_warntype f(a)
Variables
#self#::Core.Const(f)
a::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}}
@_3::Union{Nothing, Tuple{Symbol, Int64}}
df::DataFrame
k::Symbol
Body::DataFrame
1 ─ (df = Main.DataFrame())
│ %2 = Main.keys(a)::Core.Const((:A, :B))
│ (@_3 = Base.iterate(%2))
│ %4 = (@_3::Core.Const((:A, 2)) === nothing)::Core.Const(false)
│ %5 = Base.not_int(%4)::Core.Const(true)
└── goto #4 if not %5
2 ┄ %7 = @_3::Tuple{Symbol, Int64}::Tuple{Symbol, Int64}
│ (k = Core.getfield(%7, 1))
│ %9 = Core.getfield(%7, 2)::Int64
│ %10 = Base.getindex(a, k)::Union{UnitRange{Int64}, Vector{Int64}}
│ %11 = Main.sample(%10, 10)::Vector{Int64}
│ Base.setindex!(df, %11, k)
│ (@_3 = Base.iterate(%2, %9))
│ %14 = (@_3 === nothing)::Bool
│ %15 = Base.not_int(%14)::Bool
└── goto #4 if not %15
3 ─ goto #2
4 ┄ return df
问题是:写f(a)
最有效的方法是什么?在我的特定情况下,数据帧的所有列都将具有相同的类型,这些信息对编译器有帮助吗?
您可以生成数据,并且仅在最后一步将其转换为DataFrame
。有多种方法可以实现,其中之一是map
,它对元组是类型稳定的
function g(a)
map(x -> sample(x, 10), a) |> DataFrame
end
julia> @code_warntype(g(a))
MethodInstance for g(::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}})
from g(a) in Main at REPL[103]:1
Arguments
#self#::Core.Const(g)
a::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}}
Locals
#47::var"#47#48"
Body::DataFrame
1 ─ (#47 = %new(Main.:(var"#47#48")))
│ %2 = #47::Core.Const(var"#47#48"())
│ %3 = Main.map(%2, a)::NamedTuple{(:A, :B), Tuple{Vector{Int64}, Vector{Int64}}}
│ %4 = (%3 |> Main.DataFrame)::DataFrame
└── return %4