在 Julia 中将数据帧重采样为每小时 15 分钟和 5 分钟周期



我对Julia很陌生,但我正在尝试一下,因为基准测试声称它比Python快得多。

我正在尝试使用一些格式为 ["unixtime"、"价格"、"金额"] 的股票报价数据

我设法加载数据并将 unixtime 转换为 Julia 中的日期,但现在我需要重新采样数据以使用 olhc(开盘价、最高价、最低价、收盘价)作为价格和金额总和,在 Julia 的特定时间段内(每小时、15 分钟、5 分钟等):

julia> head(btc_raw_data)
6x3 DataFrame:
                           date price  amount
[1,]    2011-09-13T13:53:36 UTC   5.8     1.0
[2,]    2011-09-13T13:53:44 UTC  5.83     3.0
[3,]    2011-09-13T13:53:49 UTC   5.9     1.0
[4,]    2011-09-13T13:53:54 UTC   6.0    20.0
[5,]    2011-09-13T14:32:53 UTC  5.95 12.4521
[6,]    2011-09-13T14:35:04 UTC  5.88   7.458

我看到有一个名为 Ressampling 的包,但它似乎不接受时间段,只接受我希望输出数据具有的行数。

还有其他选择吗?

您可以使用

https://github.com/femtotrader/TimeSeriesIO.jl 将数据帧(从DataFrames.jl)转换为TimeArray(来自TimeSeries.jl)

using TimeSeriesIO: TimeArray
ta = TimeArray(df, colnames=[:price], timestamp=:date)

您可以使用 TimeSeriesResampler https://github.com/femtotrader/TimeSeriesResampler.jl 对时间序列(来自 TimeSeries.jl 的 TimeArray)进行重采样和时间框架 https://github.com/femtotrader/TimeFrames.jl

using TimeSeriesResampler: resample, mean, ohlc, sum, TimeFrame
# Define a sample timeseries (prices for example)
idx = DateTime(2010,1,1):Dates.Minute(1):DateTime(2011,1,1)
idx = idx[1:end-1]
N = length(idx)
y = rand(-1.0:0.01:1.0, N)
y = 1000 + cumsum(y)
#df = DataFrame(Date=idx, y=y)
ta = TimeArray(collect(idx), y, ["y"])
println("ta=")
println(ta)
# Define how datetime should be grouped (timeframe)
tf = TimeFrame(dt -> floor(dt, Dates.Minute(15)))
# resample using OHLC values
ta_ohlc = ohlc(resample(ta, tf))
println("ta_ohlc=")
println(ta_ohlc)
# resample using mean values
ta_mean = mean(resample(ta, tf))
println("ta_mean=")
println(ta_mean)
# Define an other sample timeseries (volume for example)
vol = rand(0:0.01:1.0, N)
ta_vol = TimeArray(collect(idx), vol, ["vol"])
println("ta_vol=")
println(ta_vol)
# resample using sum values
ta_vol_sum = sum(resample(ta_vol, tf))
println("ta_vol_sum=")
println(ta_vol_sum)

您应该获得:

julia> ta
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00
                      y
2010-01-01T00:00:00 | 1000.16
2010-01-01T00:01:00 | 1000.1
2010-01-01T00:02:00 | 1000.98
2010-01-01T00:03:00 | 1001.38
⋮
2010-12-31T23:56:00 | 972.3
2010-12-31T23:57:00 | 972.85
2010-12-31T23:58:00 | 973.74
2010-12-31T23:59:00 | 972.8

julia> ta_ohlc
35040x4 TimeSeries.TimeArray{Float64,2,DateTime,Array{Float64,2}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00
                      Open       High       Low        Close
2010-01-01T00:00:00 | 1000.16    1002.5     1000.1     1001.54
2010-01-01T00:15:00 | 1001.57    1002.64    999.38     999.38
2010-01-01T00:30:00 | 999.13     1000.91    998.91     1000.91
2010-01-01T00:45:00 | 1001.0     1006.42    1001.0     1006.42
⋮
2010-12-31T23:00:00 | 980.84     981.56     976.53     976.53
2010-12-31T23:15:00 | 975.74     977.46     974.71     975.31
2010-12-31T23:30:00 | 974.72     974.9      971.73     972.07
2010-12-31T23:45:00 | 972.33     973.74     971.49     972.8

julia> ta_mean
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00
                      y
2010-01-01T00:00:00 | 1001.1047
2010-01-01T00:15:00 | 1001.686
2010-01-01T00:30:00 | 999.628
2010-01-01T00:45:00 | 1003.5267
⋮
2010-12-31T23:00:00 | 979.1773
2010-12-31T23:15:00 | 975.746
2010-12-31T23:30:00 | 973.482
2010-12-31T23:45:00 | 972.3427
julia> ta_vol
525600x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:59:00
                      vol
2010-01-01T00:00:00 | 0.37
2010-01-01T00:01:00 | 0.67
2010-01-01T00:02:00 | 0.29
2010-01-01T00:03:00 | 0.28
⋮
2010-12-31T23:56:00 | 0.74
2010-12-31T23:57:00 | 0.66
2010-12-31T23:58:00 | 0.22
2010-12-31T23:59:00 | 0.47

julia> ta_vol_sum
35040x1 TimeSeries.TimeArray{Float64,1,DateTime,Array{Float64,1}} 2010-01-01T00:00:00 to 2010-12-31T23:45:00
                      vol
2010-01-01T00:00:00 | 7.13
2010-01-01T00:15:00 | 6.99
2010-01-01T00:30:00 | 8.73
2010-01-01T00:45:00 | 8.27
⋮
2010-12-31T23:00:00 | 6.11
2010-12-31T23:15:00 | 7.49
2010-12-31T23:30:00 | 5.75
2010-12-31T23:45:00 | 8.36

最新更新