我写了下面的函数,但它比R的列表慢50%。files(x, recursive = TRUE)。也许还有一种方法可以让它更快。
function list_files(x)
v = String[]
for (root, dirs, files) in walkdir(x)
for file in files
file = joinpath(root, file)
push!(v, file)
end
end
v
end
我不知道,在我的机器上,walkdir
似乎非常接近最佳。
julia> using BenchmarkTools
julia> @benchmark list_files($".")
BenchmarkTools.Trial: 195 samples with 1 evaluation.
Range (min … max): 22.098 ms … 54.305 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 25.395 ms ┊ GC (median): 0.00%
Time (mean ± σ): 25.636 ms ± 3.179 ms ┊ GC (mean ± σ): 0.13% ± 0.87%
▄▂ ▄ ▄ ▅▅ ▅ ▂█
▆▃██▇▆▆▇█▇██▇▆██▇▇▇█▆██▆▆███▇▅██▅▇▅▁▆▁▅▅▇▅▆▆▁▅▃▁▁▁▃▁▃▁▁▁▁▁▃ ▃
22.1 ms Histogram: frequency by time 31.1 ms <
Memory estimate: 989.60 KiB, allocs estimate: 8987.
julia> @benchmark split(read($`find . -type f`, String), 'n')
BenchmarkTools.Trial: 149 samples with 1 evaluation.
Range (min … max): 26.661 ms … 47.466 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 32.541 ms ┊ GC (median): 0.00%
Time (mean ± σ): 33.547 ms ± 4.435 ms ┊ GC (mean ± σ): 0.60% ± 3.90%
▄ █ ▂ ▂
▃▁▆█▆▅▆█▇▃█▇▇▇▇██████▆█▃█▅▇▁▁▆▅▅▆▁▅▅▃▃▇▅▃▃▅▁▇▁▁▃▃▁▃▁▁▁▃▅▃▃▃ ▃
26.7 ms Histogram: frequency by time 45.3 ms <
Memory estimate: 626.14 KiB, allocs estimate: 79.
如果R真的比find . -type f
快得多,那么这是相当令人印象深刻的。
关于另一个答案。您是否正确地执行了测量—磁盘和文件管理器有各种缓存方法。因此,两个单独的调用可能有非常不同的测量值。
关于代码,您可以在一行中使用walkdir
。例如:
julia> [(joinpath.(folder,files) for (folder,dir,files) in walkdir(raw"C:Julia-1.8.0-rc3lib"))...;]
3-element Vector{String}:
"C:\Julia-1.8.0-rc3\lib\libjulia.dll.a"
"C:\Julia-1.8.0-rc3\lib\libopenlibm.dll.a"
"C:\Julia-1.8.0-rc3\lib\julia\sys.dll"```
对于我的用例(一个有很多递归的文件夹)来说,R真的更快。
R结果:
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 list.files(path, recursive = TRUE) 16.5ms 20.3ms 45.1 1.25KB 0 23 0 510ms <chr> <Rprofmem> <bench_tm> <tibble>
茱莉亚的结果:
@benchmark [(joinpath.(folder,files) for (folder,dir,files) in walkdir(path))...;]
BenchmarkTools.Trial: 8 samples with 1 evaluation.
Range (min … max): 574.158 ms … 976.682 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 598.369 ms ┊ GC (median): 0.00%
Time (mean ± σ): 649.435 ms ± 134.988 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
▁█ ▁▁ ▁ ▁ ▁
██▁██▁█▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
574 ms Histogram: frequency by time 977 ms <
Memory estimate: 71.47 KiB, allocs estimate: 1019.