Open
Description
julia> data = [i+j for i in 1:200, j in 1:100]
200×100 Matrix{Int64}:
[...]
julia> da = ChunkedDiskArray(data, chunksize=(10,10))
200×100 ChunkedDiskArray{Int64, 2, Matrix{Int64}}
Chunked: (
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
)
julia> using Chairmarks
julia> @be map($(x -> x * 5.0), $data) seconds=1
Benchmark: 33264 samples with 2 evaluations
min 2.313 μs (3 allocs: 156.328 KiB)
median 11.354 μs (3 allocs: 156.328 KiB)
mean 13.788 μs (3 allocs: 156.328 KiB, 1.47% gc time)
max 8.403 ms (3 allocs: 156.328 KiB, 99.68% gc time)
julia> @be map($(x -> x * 5.0), $da) seconds=1
Benchmark: 2595 samples with 1 evaluation
min 278.083 μs (22409 allocs: 2.695 MiB)
median 292.666 μs (22409 allocs: 2.695 MiB)
mean 361.557 μs (22409 allocs: 2.695 MiB, 6.40% gc time)
max 17.373 ms (22409 allocs: 2.695 MiB, 97.65% gc time)
julia> da = UnchunkedDiskArray(data)
200×100 UnchunkedDiskArray{Int64, 2, Matrix{Int64}}
Unchunked
julia> @be map($(x -> x * 5.0), $da) seconds=1
Benchmark: 3189 samples with 1 evaluation
min 234.916 μs (20039 allocs: 2.596 MiB)
median 248.333 μs (20039 allocs: 2.596 MiB)
mean 292.711 μs (20039 allocs: 2.596 MiB, 6.61% gc time)
max 1.075 ms (20039 allocs: 2.596 MiB, 73.20% gc time)
It looks like DiskGenerator is not looping over chunks at all, but rather is performing random access. Should we make it so that it loops over chunks? Perhaps by making it stateful, and letting it keep the current chunk "in memory"? Not sure what the best solution is here...but there must be something better than a 2 order of magnitude slowdown...
Metadata
Metadata
Assignees
Labels
No labels