Julia,通过将函数应用于元组元素来创建DataFrame的有效方法



我需要使用存储在命名元组中的一些数据,通过对它们应用函数,来创建具有与元组元素相同的列号和名称的数据帧。例如:

a = (A = [1, 2], B = 1:6)
f(a) = begin
df = DataFrame()
for k in keys(a) df[k] = sample(a[k], 10) end # There could be any other function in place of sample()
df
end

但如果我运行@code_warntype,我会得到Union类型,我知道这意味着编译器无法在运行前预测类型,这会影响性能:

julia> @code_warntype f(a)
Variables
#self#::Core.Const(f)
a::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}}
@_3::Union{Nothing, Tuple{Symbol, Int64}}
df::DataFrame
k::Symbol
Body::DataFrame
1 ─       (df = Main.DataFrame())
│   %2  = Main.keys(a)::Core.Const((:A, :B))
│         (@_3 = Base.iterate(%2))
│   %4  = (@_3::Core.Const((:A, 2)) === nothing)::Core.Const(false)
│   %5  = Base.not_int(%4)::Core.Const(true)
└──       goto #4 if not %5
2 ┄ %7  = @_3::Tuple{Symbol, Int64}::Tuple{Symbol, Int64}
│         (k = Core.getfield(%7, 1))
│   %9  = Core.getfield(%7, 2)::Int64
│   %10 = Base.getindex(a, k)::Union{UnitRange{Int64}, Vector{Int64}}
│   %11 = Main.sample(%10, 10)::Vector{Int64}
│         Base.setindex!(df, %11, k)
│         (@_3 = Base.iterate(%2, %9))
│   %14 = (@_3 === nothing)::Bool
│   %15 = Base.not_int(%14)::Bool
└──       goto #4 if not %15
3 ─       goto #2
4 ┄       return df

问题是:写f(a)最有效的方法是什么?在我的特定情况下,数据帧的所有列都将具有相同的类型,这些信息对编译器有帮助吗?

您可以生成数据,并且仅在最后一步将其转换为DataFrame。有多种方法可以实现,其中之一是map,它对元组是类型稳定的

function g(a)
map(x -> sample(x, 10), a) |> DataFrame
end
julia> @code_warntype(g(a))
MethodInstance for g(::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}})
from g(a) in Main at REPL[103]:1
Arguments
#self#::Core.Const(g)
a::NamedTuple{(:A, :B), Tuple{Vector{Int64}, UnitRange{Int64}}}
Locals
#47::var"#47#48"
Body::DataFrame
1 ─      (#47 = %new(Main.:(var"#47#48")))
│   %2 = #47::Core.Const(var"#47#48"())
│   %3 = Main.map(%2, a)::NamedTuple{(:A, :B), Tuple{Vector{Int64}, Vector{Int64}}}
│   %4 = (%3 |> Main.DataFrame)::DataFrame
└──      return %4

最新更新