Skip to content

FIX: Revise bzero handling in dMRI data's to_nifti() #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 8, 2025

Conversation

jhlegarreta
Copy link
Contributor

@jhlegarreta jhlegarreta commented Jan 31, 2025

Add null gradient values to DWI data serialization if appropriate.

When the flag insert_b0 is True the previous implementation prepended the bzero attribute data to the DWI instance dataobj, but it was not adding the corresponding null gradient value to the bval/bvec file pair. This patch set prepends a null gradient to the bval/bvec file pair.

Parameterize the test_load testing function to check both cases (i.e. insert_b0 False and True).

Add additional checks to ensure that the write/read round-trip works as expected, and that the bvals and bvecs attributes have the expected values.

Fixes #79.

@jhlegarreta
Copy link
Contributor Author

jhlegarreta commented Jan 31, 2025

This only demonstrates the issue; I wrote the code as I spent time investigating the issue, so pushing it here to move forward. Some thinking/discussion is needed to see what convention we follow so that the issue can be addressed properly.

Copy link

codecov bot commented Jan 31, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 70.93%. Comparing base (e964b96) to head (38ccdad).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
src/nifreeze/data/dmri.py 80.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   70.10%   70.93%   +0.83%     
==========================================
  Files          23       23              
  Lines        1067     1132      +65     
  Branches      129      136       +7     
==========================================
+ Hits          748      803      +55     
- Misses        275      284       +9     
- Partials       44       45       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhlegarreta jhlegarreta force-pushed the FixNonDWIWeightedGradientRW branch 3 times, most recently from 94513d5 to 9be0147 Compare May 3, 2025 17:13
@jhlegarreta jhlegarreta force-pushed the FixNonDWIWeightedGradientRW branch from 9be0147 to 5f0c848 Compare May 28, 2025 00:40
@jhlegarreta
Copy link
Contributor Author

With the progress and refactorings made lately, I think it is fair to revisit this. @oesteban let's set aside some time to discuss about this.

@oesteban
Copy link
Member

oesteban commented Jun 2, 2025

Sure, please book my calendar :)

@jhlegarreta jhlegarreta force-pushed the FixNonDWIWeightedGradientRW branch 2 times, most recently from 42d2d6f to 5effcf3 Compare June 8, 2025 01:29
Add null gradient values to DWI data serialization if appropriate.

When the flag `insert_b0` is `True` the previous implementation
prepended the `bzero` attribute data to the `DWI` instance `dataobj`,
but it was not adding the corresponding null gradient value to the
bval/bvec file pair. This patch set prepends a null gradient to the
bval/bvec file pair.

Parameterize the `test_load` testing function to check both cases (i.e.
`insert_b0` `False` and `True`).

Add additional checks to ensure that the write/read round-trip works as
expected, and that the `bvals` and `bvecs` attributes have the expected
values.

Take advantage of the commit to make it clear in the `DWI.to_nifti`
method that it assumes that the `bzero` attribute value is never `None`.
@jhlegarreta jhlegarreta force-pushed the FixNonDWIWeightedGradientRW branch from 5effcf3 to 8413690 Compare June 8, 2025 01:30
@jhlegarreta jhlegarreta marked this pull request as ready for review June 8, 2025 01:31
@jhlegarreta jhlegarreta changed the title BUG: Fix the non-weighted DWI data serialization FIX: Add null gradient values to DWI data serialization if appropriate Jun 8, 2025
@jhlegarreta
Copy link
Contributor Author

jhlegarreta commented Jun 8, 2025

I think I am done with this: it is ready to be reviewed/merged.

A few comments:

  • Do the dwi-b0_desc-avg.nii.gz, dwi-b0_desc-brain.nii.gz and dwi-b0_desc-brain_mask.nii.gz testing data files correspond to what we should expect to be contained in dwi.h5 testing data file?
    i.e. The below:

    bzero_brain_nifti = nb.load(datadir / "dwi-b0_desc-brain.nii.gz").get_fdata().astype(np.int16)
    bzero_brain_h5 = np.zeros_like(dwi_h5.bzero)
    bzero_brain_h5[dwi_h5.brainmask] = dwi_h5.bzero[dwi_h5.brainmask]
    assert np.allclose(bzero_brain_nifti, bzero_brain_h5)
    

    raises an error. I must be missing something if they should contain the same data.

    https://gin.g-node.org/nipreps-data/tests-nifreeze

  • The brainmask is not serialized, so if I were to test that, the round trip would fail. To be added in a separate PR.

  • This block

    # The b=0 volumes are those that did NOT pass b0_thres
    b0_volumes = fulldata[..., ~gradmsk]
    # A simple approach is to take the median across that last dimension
    # Note that axis=3 is valid only if your data is 4D (x, y, z, volumes).
    dwi_obj.bzero = np.median(b0_volumes, axis=3)

    should be put to a function so that it can be re-used and changed should another strategy be preferred. To be done in a separate PR.

@oesteban oesteban changed the title FIX: Add null gradient values to DWI data serialization if appropriate FIX: Revise bzero handling in dMRI data's to_nifti() Jun 8, 2025
Copy link
Member

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've allowed myself to edit the PR's title because I think it was biasing the solution.

IMHO, #79 partially misrepresents the problem: HDF5 offers built-in serialization (and this is why we are using it, e.g., to access the data from multiple processes/threads without headaches).

I believe the issue is in reality simpler. The object's to_nifti() has a poor implementation. If we removed to_nifti() (which honestly, doesn't sound crazy, there's nothing that requires this feature to be part of the data object), then we don't have the problem anymore. It could be a separate function and #79 would read like "to_nifti() doesn't work great".

If that argument can be agreed upon, I've reviewed the PR accordingly. I'm not extracting to_nifti() from the object (but it possibly makes sense to consider in the future). I'll send a PR against this branch for (IMHO) a more consistent handling of cases when bzero is not set but insert_b0 is set on.

I think I am done with this: it is ready to be reviewed/merged.

A few comments:

  • Do the dwi-b0_desc-avg.nii.gz, dwi-b0_desc-brain.nii.gz and dwi-b0_desc-brain_mask.nii.gz testing data files correspond to what we should expect to be contained in dwi.h5 testing data file?

I don't remember, TBH. There's no need to stick with them if they do not meet our purposes in testing.

  • The brainmask is not serialized, so if I were to test that, the round trip would fail. To be added in a separate PR.

I'm comfortable with this, I don't think nifreeze should operate on the mask so there's no reason to write it out. It is part of the object because it is necessary in execution time. Recently, I've been thinking we could recommend that this were not a "brain" mask, but rather a conservative "brain parenchyma" mask (i.e., excluding CSF, and conservative because maybe a binary dilation could be applied to ensure no brain tissue is excluded).

By the same argument that nifreeze should not deal with the generation of good/customized b=0s, I don't think it should take responsibility for the mask either. If something is given from the exterior, the responsibility for storing it when/if necessary should rely on the exterior.

  • This block
    # The b=0 volumes are those that did NOT pass b0_thres
    b0_volumes = fulldata[..., ~gradmsk]
    # A simple approach is to take the median across that last dimension
    # Note that axis=3 is valid only if your data is 4D (x, y, z, volumes).
    dwi_obj.bzero = np.median(b0_volumes, axis=3)

    should be put to a function so that it can be re-used and changed should another strategy be preferred. To be done in a separate PR.

@oesteban
Copy link
Member

oesteban commented Jun 8, 2025

My code suggestions are at jhlegarreta#1

One issue I realized of while reviewing is that, with insert_b0=False the b- vecs/vals would not be written out!

Hopefully my suggestions make sense :)

fix: better handling of bzero & write out b-vecs/vals
@jhlegarreta
Copy link
Contributor Author

@oesteban I am fine with this if you believe this is the way forward. I'd grateful if you merged this as soon as I make the tests pass: we are back to the Stanford dataset issues 🤦‍♂️ (x-ref issue #149) :
https://github.com/nipreps/nifreeze/actions/runs/15519578287/job/43690979304?pr=82#step:11:222

  E   dipy.data.fetcher.FetcherError: The downloaded file, /home/runner/.dipy/stanford_hardi/HARDI150.nii.gz, does not have the expected md5
  E      checksum of "0b18513b46132b4d1051ed3364f2acbc". Instead, the md5 checksum was: "24f8bbdb75cbd20fd79b099a10f58567". This
  E      could mean that something is wrong with the file or that the upstream file has been
  E      updated. You can try downloading the file again or updating to the newest version of
  E      dipy.

@oesteban oesteban merged commit 5fd5833 into nipreps:main Jun 8, 2025
17 of 20 checks passed
@jhlegarreta jhlegarreta deleted the FixNonDWIWeightedGradientRW branch June 8, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data.dmri has mismatches between reading/writting/representation
2 participants