Skip to content

accelerating v2 -> v3 migration #3076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
d-v-b opened this issue May 21, 2025 · 8 comments
Open

accelerating v2 -> v3 migration #3076

d-v-b opened this issue May 21, 2025 · 8 comments

Comments

@d-v-b
Copy link
Contributor

d-v-b commented May 21, 2025

It's been several months since we released zarr-python 3 and there are still many active projects using zarr-python 2. For people deeply invested in the zarr-python 2 store API, migration to zarr-python 3 may not be easy, since the store API is very different. With this in mind, I think we should explore options for making migration from the zarr-python 2 APIs to the zarr-python 3 APIs easier.

A few ideas:

  • a v2 namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this PR
  • wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3 MemoryStore is a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke)
  • a rational approach to codecs. this is a longer conversation.

Any other ideas?

@dstansby
Copy link
Contributor

I'd add:

  • A complete migration guide, listing exactly how to translate every part of the v2 API to the v3 API
  • A tool to convert v2 data to v3 data in-place, without copying any data

@TomNicholas
Copy link
Member

Is this not already covered in the existing migration guide?

A wrapper class seems very useful, and might even expose some incompatibilities. You could also make a wrapper class and then raise deprecation warnings when it gets used.

@jhamman
Copy link
Member

jhamman commented May 21, 2025

a v2 namespace in zarr-python 3 that contains all the code from zarr-python 2.x. See this zulip post, and this #3075

This was discussed at length in the run up to the 3.0 release. Ultimately, we choose to remove the 2.18.X code from the release. With that in mind, I'm curious what has changed that would have us reverse this? I'm not entirely opposed to it but I'd like to think through the process a bit.

wrapper classes that can encapsulate a zarr-python-2-compatible store API in a zarr-python-3-compatible store. I think the v3 MemoryStore is a good target for this. This might be of interest to people who wrote a lot of zarr-python-v2-compatible stores that would be onerous to directly migrate (cc @cgohlke)

This is a good idea. It will not be easy to make async-friendly but it will "work"

a rational approach to codecs. this is a longer conversation.

From the perspective of Xarray users, getting the 3.0 dtypes and codecs into a stable state is the highest priority at this point.

@tasansal
Copy link
Contributor

tasansal commented May 21, 2025

There are a few v2 functionalities that still don't exist or work properly in v3 for full migration of our workflows (at least for us).

  1. Struct data type support [v3] Structured dtype support #2134
  2. FSSpec Caching doesn't work [v3] Caching from fsspec doesn't work with FSSpecStore #2988
  3. Synchronization primitives don't exist (we did parallel overlapping writes with Thread/Process locks in v2)

@FabricioArendTorres
Copy link

Just a comment from a user perspective, adding onto the previous comment.

We delay(ed) the move to v3 not because of migration difficulties, but due to a few missing features (e.g. copying stores),
and due to pyodide being unable to deal with async code (yet).

@dstansby
Copy link
Contributor

Adding back some kind of least-recently-used cache would be very helpful too. (like LRUCache in v2)

@psobolewskiPhD
Copy link

Just going to link this here for reference:
lastest tifffile supports zarr3 but performance is much worse for real images (large arrays, not-optimal chunks) than zarr2. This is discussed starting here:
cgohlke/tifffile#297 (comment)
Using this, over on the napari-tiff plugin side, we've updated to support zarr3, but I regret it a bit because performance has regressed so much on real whole-slide-images.

@jhamman
Copy link
Member

jhamman commented May 30, 2025

@psobolewskiPhD - would you mind opening a separate issue to discuss performance regressions? We'd love to understand the issue here but we've seen the opposite result in many zarr3 applications so we'll need to dig in to be helpful. Anything you can provide in terms of a reproducer would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants