Julia 1.5.2性能问题

我目前正在尝试实现元启发式(遗传(算法。在这次创业中，我也想尝试创建一些快速高效的代码。然而，我在创建高效编码方面的经验并不是很丰富。因此，我想知道是否有人可以给予一些"；快速提示"；以提高代码的效率。我已经创建了我的代码的一个小功能示例，它包含了代码将包含的大多数元素，如预分配数组、自定义可变结构、随机数、推入数组等。

我已经尝试探索的选项是关于一揽子计划的选项；StaticArrays"；。然而，我的许多数组必须是可变的(因此我们需要MArray(，并且其中许多数组将变得非常大>100.StaticArrays的文档规定，StaticArray包的大小必须保持较小才能保持高效。

根据文档Julia 1.5.2在rand((方面应该是线程安全的。因此，我尝试对函数中的循环进行多线程处理，以使它们运行得更快。这导致性能略有提高。

然而，如果人们能找到一种更有效的方法来分配数组或将SpotPrices推送到数组中，我们将不胜感激！任何其他性能提示也非常欢迎！

# Packages
clearconsole()
using DataFrames
using Random
using BenchmarkTools
Random.seed!(42)
df = DataFrame( SpotPrice = convert(Array{Float64}, rand(-266:500,8832)),
month = repeat([1,2,3,4,5,6,7,8,9,10,11,12]; outer = 736),
hour = repeat([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]; outer = 368))
# Data structure for the prices per hour
mutable struct SpotPrices
hour :: Array{Float64,1}
end
# Fill-out data structure
function setup_prices(df::DataFrame)
prices = []
for i in 1:length(unique(df[:,3]))
push!(prices, SpotPrices(filter(row -> row.hour == i, df).SpotPrice))
end
return prices
end
prices = setup_prices(df)
# Sampler function
function MC_Sampler(prices::Vector{Any}, sample_size::Int64)
# Picking the samples
tmp = zeros(sample_size, 24)
# Sampling per hour
for i in 1:24
tmp[:,i] = rand(prices[i].hour, sample_size)
end
return tmp
end
samples = MC_Sampler(prices, 100)
@btime setup_prices(df)
@btime MC_Sampler(prices,100)
function setup_prices_par(df::DataFrame)
prices = []
@sync Threads.@threads for i in 1:length(unique(df[:,3]))
push!(prices, SpotPrices(filter(row -> row.hour == i, df).SpotPrice))
end
return prices
end

# Sampler function
function MC_Sampler_par(prices::Vector{Any}, sample_size::Int64)
# Picking the samples
tmp = zeros(sample_size, 24)
# Sampling per hour
@sync Threads.@threads for i in 1:24
tmp[:,i] = rand(prices[i].hour, sample_size)
end
return tmp
end
@btime setup_prices_par(df)
@btime MC_Sampler_par(prices,100)

仔细阅读https://docs.julialang.org/en/v1/manual/performance-tips/

基本清理开始于：

您的SpotPricesstruct对我来说不需要是可变的。无论如何，由于只有一个字段，您可以将其定义为SpotPrices=Vector{Float64}
您不希望使用非类型化的容器-而不是prices = []，而是prices = Float64[]
使用DataFrames.groupby将比查找唯一元素并进行过滤快得多
如果您不需要草签，则Vector{Float64}(undef, sample_size)比zeros(sample_size, 24)快得多
在多线程循环之前不需要同步@sync
创建一个随机状态-每个线程一个单独的状态，并在调用rand函数时使用它们

相关内容

最新更新

热门标签：