-
Notifications
You must be signed in to change notification settings - Fork 634
hdr overhaul, especially IOProxy support #3218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 tasks
* Add full IOProxy support to both input and output. * This necessitated a major refactor, including sucking the I/O specific code from rgbe.{cpp,h} into hdr{input,output}.cpp and getting rid of the old files. * Add a testsuite entry for hdr (shockingly, there was none before)
5 tasks
lgritz
pushed a commit
that referenced
this pull request
Oct 7, 2022
HDR code overhaul to support IOProxy (#3218) changed all reads into a `pread()` with explicit file position tracking. Turns out, that is quite a larger overhead compared to simple sequential reads; even more so on Windows. On my PC (Windows 10, VS2022, Ryzen 5950X) this change gets file read time for an 8x resolution .HDR image (with RLE compression) from 8.28s back to 1.10s just like it was in OIIO 2.3. On Windows the extra cost when using pread() is a bit more extra mutex locks, but the real cost is in enventual ReadFile; looks like any seek operations make it take some sort of "way slower" path with various callbacks and whatnot inside the kernel. --- Note repoduced from discussion #3588 from LG: Switching from IOProxy::pread to using IOProxy::seek + IOProxy::read makes the IOProxy stateful and not thread-safe without an external lock (which this HdrInput has, so it's not going to break anything), precluding any future removal of the locks from read_native_scanline to make the whole ImageInput able to read concurrently. That's probably not important for this file format, but for others (especially crucial those usable as texture), it would be desirable to rely on those stateless pread calls and remove the lock from the ImageInput. I think this patch is fine. We don't expect a lot of concurrent reads from hdr like we do for tiff and openexr that we use extensively for highly threaded on-demand reading of texture. It fixes the recent performance regression, and it doesn't really cost us concurrency because we already had the lock in HdrInput. But in general, we want to prefer eschewing read/seek and using pread instead, so we can expect maximum concurrency from multiple threads using an ImageInput (as we do in ImageCache/TextureSystem). So I believe that a revised and more complete diagnosis is that the problem isn't the pread vs read per se, but that for RLE-compressed images we were calling pread separately for every 4 bytes! The calls to read are faster simply because they are buffered (in FILE) and require fewer OS system calls. But the real sin is that we call it way too many times. I think that the rle case could go back to pread if we want, but do just one pread per scanline (rather than one per pixel value run), asking for the maximum amount that will be needed to read the scanline for the worst case rle representation (it's ok if it's the end of the file, pread will just read up to the end and return the true number of bytes read). In other words, we'd be doing the buffering ourselves. But anyway, the real misdesign of this HDR reader is ancient, and consists of doing many many tiny reads. Even the buffered fread would be sped up a lot by not doing it separately for every 2-4 bytes of the file. I don't know if it's worth fixing for this format. Depends on whether anybody really needs to maximize its performance or expects multiple threads reading the same hdr file to be fully concurrent.
lgritz
pushed a commit
to lgritz/OpenImageIO
that referenced
this pull request
Oct 8, 2022
…AcademySoftwareFoundation#3588) HDR code overhaul to support IOProxy (AcademySoftwareFoundation#3218) changed all reads into a `pread()` with explicit file position tracking. Turns out, that is quite a larger overhead compared to simple sequential reads; even more so on Windows. On my PC (Windows 10, VS2022, Ryzen 5950X) this change gets file read time for an 8x resolution .HDR image (with RLE compression) from 8.28s back to 1.10s just like it was in OIIO 2.3. On Windows the extra cost when using pread() is a bit more extra mutex locks, but the real cost is in enventual ReadFile; looks like any seek operations make it take some sort of "way slower" path with various callbacks and whatnot inside the kernel. --- Note repoduced from discussion AcademySoftwareFoundation#3588 from LG: Switching from IOProxy::pread to using IOProxy::seek + IOProxy::read makes the IOProxy stateful and not thread-safe without an external lock (which this HdrInput has, so it's not going to break anything), precluding any future removal of the locks from read_native_scanline to make the whole ImageInput able to read concurrently. That's probably not important for this file format, but for others (especially crucial those usable as texture), it would be desirable to rely on those stateless pread calls and remove the lock from the ImageInput. I think this patch is fine. We don't expect a lot of concurrent reads from hdr like we do for tiff and openexr that we use extensively for highly threaded on-demand reading of texture. It fixes the recent performance regression, and it doesn't really cost us concurrency because we already had the lock in HdrInput. But in general, we want to prefer eschewing read/seek and using pread instead, so we can expect maximum concurrency from multiple threads using an ImageInput (as we do in ImageCache/TextureSystem). So I believe that a revised and more complete diagnosis is that the problem isn't the pread vs read per se, but that for RLE-compressed images we were calling pread separately for every 4 bytes! The calls to read are faster simply because they are buffered (in FILE) and require fewer OS system calls. But the real sin is that we call it way too many times. I think that the rle case could go back to pread if we want, but do just one pread per scanline (rather than one per pixel value run), asking for the maximum amount that will be needed to read the scanline for the worst case rle representation (it's ok if it's the end of the file, pread will just read up to the end and return the true number of bytes read). In other words, we'd be doing the buffering ourselves. But anyway, the real misdesign of this HDR reader is ancient, and consists of doing many many tiny reads. Even the buffered fread would be sped up a lot by not doing it separately for every 2-4 bytes of the file. I don't know if it's worth fixing for this format. Depends on whether anybody really needs to maximize its performance or expects multiple threads reading the same hdr file to be fully concurrent.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add full IOProxy support to both input and output.
This necessitated a major refactor, including sucking the I/O specific
code from rgbe.{cpp,h} into hdr{input,output}.cpp and getting rid of
the old files.
Add a testsuite entry for hdr (shockingly, there was none before)