Skip to content

HDR: speed up reading by around 4x #3590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 7, 2022
Merged

HDR: speed up reading by around 4x #3590

merged 1 commit into from
Oct 7, 2022

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Oct 7, 2022

Description

Most of the cost in HDR file reading was converting from RGBE to floats, inside ldexpf(), which at least on Windows/VS2022 spends most of the time inside _ctrlfp(). But we only have 256 possible exponents in total, so just use a precomputed table.

On my PC (Windows 10, VS2022, Ryzen 5950X) with #3588 applied, this change gets file read time for an 8x resolution HDR image from 1.10s down to 0.27s

Tests

No new tests necessary.

Checklist:

  • I have read the contribution guidelines.
  • If this is more extensive than a small change to existing code, I
    have previously submitted a Contributor License Agreement
    (individual, and if there is any way my
    employers might think my programming belongs to them, then also
    corporate).
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

Most of the cost in HDR file reading was converting from RGBE to floats,
inside ldexpf(), which at least on Windows/VS2022 spends most of the
time inside _ctrlfp(). But we only have 256 possible exponents in total,
so just use a precomputed table.

On my PC (Windows 10, VS2022, Ryzen 5950X) this change gets file read
time for an 8x resolution .HDR image from 1.10s down to 0.27s
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lgritz lgritz merged commit 4a4f50e into AcademySoftwareFoundation:master Oct 7, 2022
@aras-p aras-p deleted the hdr-ldexp branch October 7, 2022 19:56
lgritz pushed a commit to lgritz/OpenImageIO that referenced this pull request Oct 8, 2022
Most of the cost in HDR file reading was converting from RGBE to floats,
inside ldexpf(), which at least on Windows/VS2022 spends most of the
time inside _ctrlfp(). But we only have 256 possible exponents in total,
so just use a precomputed table.

On my PC (Windows 10, VS2022, Ryzen 5950X) this change gets file read
time for an 8x resolution .HDR image from 1.10s down to 0.27s
@ThiagoIze
Copy link
Collaborator

I know this is 2 years old, but I came across this and just wanted to point out that the table could be replaced with some floating point manipulation. Powers of 2 in IEEE754 are all represented by changes in the exponent. The sign and mantissa never change. So I think this could be further optimized by just adding the rgbe[3] - (128 + 8) to a 1.0f's exponent. This is just a few instructions and would pollute the cache much less, so I think it would result in equal or faster performance. Even if it's not much of a speedup because the bottleneck now lies elsewhere, it would at least remove a big ugly table.

I don't know if I'll have time to try this out myself, so in case I don't, I leave this here for anyone that might be inspired to try it out.

@aras-p
Copy link
Contributor Author

aras-p commented Nov 27, 2024

@ThiagoIze yeah, so your suggestion is essentially "manual reimplementation" of ldexpf, but without all of the handling that makes the actual C function be slow (floating point control word handling, possible floating point exceptions, handling of too large or too small exponents etc.). It might be faster or at least cleaner indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants