Skip to content

zonal stats: speed up dask case #572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 16, 2021
Merged

zonal stats: speed up dask case #572

merged 6 commits into from
Nov 16, 2021

Conversation

thuydotm
Copy link
Contributor

@thuydotm thuydotm commented Nov 10, 2021

This PR uses the same approach as #568 to improve performance for zonal stats when input data arrays are dask-backed. It computes stats chunk by chunk and then summarizes all the results and return output as a dask DataFrame.

This also limits stats that supported in dask case to a subset of default stats, which is safer since a custom statistics would not be always element-wise thus can produce unexpected results.

nodata_zones is removed as we already support zone_ids, and exclude invalid values (nan, inf) from our calculations.

@thuydotm thuydotm requested a review from ianthomas23 November 15, 2021 08:31
@thuydotm thuydotm added the ready to merge PR is ready to merge label Nov 15, 2021
@ianthomas23
Copy link
Contributor

Just a few minor comments, otherwise it looks good to merge.

@thuydotm
Copy link
Contributor Author

Thanks Ian, I just updated the code. I'll merge into master once the tests all passed.

@thuydotm thuydotm merged commit 9d2ee7c into master Nov 16, 2021
@thuydotm thuydotm deleted the zonal_stats_dask_speedup branch December 23, 2021 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to merge PR is ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants