refactor v3 data types #2874

d-v-b · 2025-02-28T11:43:49Z

As per #2750, we need a new model of data types if we want to support more data types. Accordingly, this PR will refactor data types for the zarr v3 side of the codebase and make them extensible. I would also like to handle v2 as well with the same data structures, and confine the v2 / v3 differences to the places where they vary.

In main,all the v3 data types are encoded as variants of an enum (i.e., strings). Enumerating each dtype as a string is cumbersome for datetimes, that are parametrized by a time unit, and plain unworkable for parametric dtypes like fixed-length strings, which are parametrized by their length. This means we need a model of data types that can be parametrized, and I think separate classes is probably the way to go here. Separating the different data types into different objects also gives us a natural way to capture some of the per-data type variability baked into the spec: each data type class can define its own default value, and also define methods for how its scalars should be converted to / from JSON.

This is a very rough draft right now -- I'm mostly posting this for visibility as I iterate on it.

…into feat/fixed-length-strings

d-v-b · 2025-02-28T13:23:18Z

copying a comment @nenb made in this zulip discussion:

The first thing that caught my eye was that you are using numpy character codes. What was the motivation for this? numpy character codes are not extensible in their current format, and lead to issues like: jax-ml/ml_dtypes#41.

A feature of the character code is that it provides a way to distinguish parametric types like U* from parametrized instances of those types (like U3). Defining a class with the character code U means instances of the class can be initialized with a "length" parameter, and then we can make U2, U3, etc, as instances of the same class. If instead we bind a concrete numpy dtype as class attributes, we need a separate class for U2, U3, etc, which is undesirable. I do think I can work around this, but I figured the explanation might be helpful.

src/zarr/core/metadata/dtype.py

src/zarr/codecs/sharding.py

…base

nenb · 2025-03-03T13:56:00Z

Summarising from a zulip discussion:

@nenb: How is the endianness of a dtype handled?

@d-v-b: In v3, endianness is specified by the codecs. The exact same data type could be decoded to big or little endian representations based on the state of a codec. This is very much not how v2 does it -- v2 puts the endianness in the dtype.

Proposed solution: Make endianness an attribute on in the dtype instance. This will be an implementation detail used by zarr-python to handle endianness, but won't be part of the dtype on disk (as requuired by the spec).

d-v-b · 2025-03-04T11:27:15Z

Summarising from a zulip discussion:

@nenb: How is the endianness of a dtype handled?

@d-v-b: In v3, endianness is specified by the codecs. The exact same data type could be decoded to big or little endian representations based on the state of a codec. This is very much not how v2 does it -- v2 puts the endianness in the dtype.

Proposed solution: Make endianness an attribute on in the dtype instance. This will be an implementation detail used by zarr-python to handle endianness, but won't be part of the dtype on disk (as requuired by the spec).

Thanks for the summary! I have implemented the proposed solution.

…at/fixed-length-strings

rabernat

I took this for a spin locally. So excited.

Lots of work has gone into this PR. Time to get it out into the wild.

docs/user-guide/data_types.rst

rabernat · 2025-06-09T15:18:30Z

docs/user-guide/data_types.rst

+.. code-block:: python
+
+  >>> scalar_value = int8.from_json_scalar(42, zarr_format=3)
+  >>> assert scalar_value == np.int8(42)


I was expecting to find a section called "defining a custom data type." But we can do that in a follow-up PR.

+1 on this, would be great to understand how structured dtypes can now be used!

Our current test failures with this branch mainly resolve around them

can you show me those test failures?

https://github.com/scverse/anndata/actions/runs/15413857020/job/43873874823?pr=1995 There are also some errors around stores being append only but I think those are unrelated (need to check).

Quick repro:

import numpy as np, pandas as pd, zarr from string import ascii_letters def gen_vstr_recarray(m, n, dtype=None): size = m * n lengths = np.random.randint(3, 5, size) letters = np.array(list(ascii_letters)) gen_word = lambda l: "".join(np.random.choice(letters, l)) arr = np.array([gen_word(l) for l in lengths]).reshape(m, n) return pd.DataFrame(arr, columns=[gen_word(5) for i in range(n)]).to_records( index=False, column_dtypes=dtype ) elem = gen_vstr_recarray(6, 5) f = zarr.open("foo.zarr") f.create_array("rec", shape=elem.shape, dtype=elem.dtype)

yielding KeyError: '|V40' in the details here:

```pytb --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[3], line 15 13 elem = gen_vstr_recarray(6, 5) 14 f = zarr.open("foo.zarr") ---> 15 f.create_array("rec", shape=elem.shape, dtype=elem.dtype)
File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/_compat.py:43, in _deprecate_positional_args.._inner_deprecate_positional_args..inner_f(*args, **kwargs)
41 extra_args = len(args) - len(all_args)
42 if extra_args <= 0:
---> 43 return f(*args, **kwargs)
45 # extra_args > 0
46 args_msg = [
47 f"{name}={arg}"
48 for name, arg in zip(kwonly_args[:extra_args], args[-extra_args:], strict=False)
49 ]

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/group.py:2523, in Group.create_array(self, name, shape, dtype, chunks, shards, filters, compressors, compressor, serializer, fill_value, order, attributes, chunk_key_encoding, dimension_names, storage_options, overwrite, config)
2428 """Create an array within this group.
2429
2430 This method lightly wraps :func:zarr.core.array.create_array.
(...) 2517 AsyncArray
2518 """
2519 compressors = _parse_deprecated_compressor(
2520 compressor, compressors, zarr_format=self.metadata.zarr_format
2521 )
2522 return Array(
-> 2523 self._sync(
2524 self._async_group.create_array(
2525 name=name,
2526 shape=shape,
2527 dtype=dtype,
2528 chunks=chunks,
2529 shards=shards,
2530 fill_value=fill_value,
2531 attributes=attributes,
2532 chunk_key_encoding=chunk_key_encoding,
2533 compressors=compressors,
2534 serializer=serializer,
2535 dimension_names=dimension_names,
2536 order=order,
2537 filters=filters,
2538 overwrite=overwrite,
2539 storage_options=storage_options,
2540 config=config,
2541 )
2542 )
2543 )

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/sync.py:208, in SyncMixin._sync(self, coroutine)
205 def _sync(self, coroutine: Coroutine[Any, Any, T]) -> T:
206 # TODO: refactor this to to take *args and **kwargs and pass those to the method
207 # this should allow us to better type the sync wrapper
--> 208 return sync(
209 coroutine,
210 timeout=config.get("async.timeout"),
211 )

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/sync.py:163, in sync(coro, loop, timeout)
160 return_result = next(iter(finished)).result()
162 if isinstance(return_result, BaseException):
--> 163 raise return_result
164 else:
165 return return_result

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/sync.py:119, in _runner(coro)
114 """
115 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
116 exception, the exception will be returned.
117 """
118 try:
--> 119 return await coro
120 except Exception as ex:
121 return ex

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/group.py:1111, in AsyncGroup.create_array(self, name, shape, dtype, chunks, shards, filters, compressors, compressor, serializer, fill_value, order, attributes, chunk_key_encoding, dimension_names, storage_options, overwrite, config)
1016 """Create an array within this group.
1017
1018 This method lightly wraps :func:zarr.core.array.create_array.
(...) 1106
1107 """
1108 compressors = _parse_deprecated_compressor(
1109 compressor, compressors, zarr_format=self.metadata.zarr_format
1110 )
-> 1111 return await create_array(
1112 store=self.store_path,
1113 name=name,
1114 shape=shape,
1115 dtype=dtype,
1116 chunks=chunks,
1117 shards=shards,
1118 filters=filters,
1119 compressors=compressors,
1120 serializer=serializer,
1121 fill_value=fill_value,
1122 order=order,
1123 zarr_format=self.metadata.zarr_format,
1124 attributes=attributes,
1125 chunk_key_encoding=chunk_key_encoding,
1126 dimension_names=dimension_names,
1127 storage_options=storage_options,
1128 overwrite=overwrite,
1129 config=config,
1130 )

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/array.py:4456, in create_array(store, name, shape, dtype, data, chunks, shards, filters, compressors, serializer, fill_value, order, zarr_format, attributes, chunk_key_encoding, dimension_names, storage_options, overwrite, config, write_data)
4451 mode: Literal["a"] = "a"
4453 store_path = await make_store_path(
4454 store, path=name, mode=mode, storage_options=storage_options
4455 )
-> 4456 return await init_array(
4457 store_path=store_path,
4458 shape=shape_parsed,
4459 dtype=dtype_parsed,
4460 chunks=chunks,
4461 shards=shards,
4462 filters=filters,
4463 compressors=compressors,
4464 serializer=serializer,
4465 fill_value=fill_value,
4466 order=order,
4467 zarr_format=zarr_format,
4468 attributes=attributes,
4469 chunk_key_encoding=chunk_key_encoding,
4470 dimension_names=dimension_names,
4471 overwrite=overwrite,
4472 config=config,
4473 )

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/array.py:4242, in init_array(store_path, shape, dtype, chunks, shards, filters, compressors, serializer, fill_value, order, zarr_format, attributes, chunk_key_encoding, dimension_names, overwrite, config)
4230 meta = AsyncArray._create_metadata_v2(
4231 shape=shape_parsed,
4232 dtype=dtype_parsed,
(...) 4239 attributes=attributes,
4240 )
4241 else:
-> 4242 array_array, array_bytes, bytes_bytes = _parse_chunk_encoding_v3(
4243 compressors=compressors,
4244 filters=filters,
4245 serializer=serializer,
4246 dtype=dtype_parsed,
4247 )
4248 sub_codecs = cast("tuple[Codec, ...]", (*array_array, array_bytes, *bytes_bytes))
4249 codecs_out: tuple[Codec, ...]

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/array.py:4684, in _parse_chunk_encoding_v3(compressors, filters, serializer, dtype)
4674 def _parse_chunk_encoding_v3(
4675 *,
4676 compressors: CompressorsLike,
(...) 4679 dtype: np.dtype[Any],
4680 ) -> tuple[tuple[ArrayArrayCodec, ...], ArrayBytesCodec, tuple[BytesBytesCodec, ...]]:
4681 """
4682 Generate chunk encoding classes for v3 arrays with optional defaults.
4683 """
-> 4684 default_array_array, default_array_bytes, default_bytes_bytes = _get_default_chunk_encoding_v3(
4685 dtype
4686 )
4688 if filters is None:
4689 out_array_array: tuple[ArrayArrayCodec, ...] = ()

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/array.py:4590, in _get_default_chunk_encoding_v3(np_dtype)
4584 def _get_default_chunk_encoding_v3(
4585 np_dtype: np.dtype[Any],
4586 ) -> tuple[tuple[ArrayArrayCodec, ...], ArrayBytesCodec, tuple[BytesBytesCodec, ...]]:
4587 """
4588 Get the default ArrayArrayCodecs, ArrayBytesCodec, and BytesBytesCodec for a given dtype.
4589 """
-> 4590 dtype = DataType.from_numpy(np_dtype)
4591 if dtype == DataType.string:
4592 dtype_key = "string"

File ~/Projects/Theis/anndata/venv_13/lib/python3.13/site-packages/zarr/core/metadata/v3.py:705, in DataType.from_numpy(cls, dtype)
687 return DataType.string
688 dtype_to_data_type = {
689 "|b1": "bool",
690 "bool": "bool",
(...) 703 "<c16": "complex128",
704 }
--> 705 return DataType[dtype_to_data_type[dtype.str]]

KeyError: '|V40'

</details>

thanks! I will try to get this tested and fixed in this PR

Seems to have worked!

@d-v-b Only thing is that you need specify VLenUtf8 as a filter for the v2 file format case

https://github.com/scverse/anndata/pull/1995/files Everything that isn't in the unit tests is what we had to change to work with this PR from before. And structured dtypes are a go for us with v3, things seems to be working.

We still are having some backwards compat issues - I think I had previously posted about them around v2 file format fill values for strings being allowed to be 0. I'm adding the test data here:

import zarr store = zarr.open(zarr.storage.ZipStore('/path_to/adata.zarr.zip', read_only=True), mode="r") store["obs"]["__categories/cat_ordered"]

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[27], line 1 ----> 1 store["obs"]["__categories/cat_ordered"] File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:1860, in Group.__getitem__(self, path) 1833 def __getitem__(self, path: str) -> Array | Group: 1834 """Obtain a group member. 1835 1836 Parameters (...) 1858 1859 """ -> 1860 obj = self._sync(self._async_group.getitem(path)) 1861 if isinstance(obj, AsyncArray): 1862 return Array(obj) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/sync.py:208, in SyncMixin._sync(self, coroutine) 205 def _sync(self, coroutine: Coroutine[Any, Any, T]) -> T: 206 # TODO: refactor this to to take *args and **kwargs and pass those to the method 207 # this should allow us to better type the sync wrapper --> 208 return sync( 209 coroutine, 210 timeout=config.get("async.timeout"), 211 ) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/sync.py:163, in sync(coro, loop, timeout) 160 return_result = next(iter(finished)).result() 162 if isinstance(return_result, BaseException): --> 163 raise return_result 164 else: 165 return return_result File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/sync.py:119, in _runner(coro) 114 """ 115 Await a coroutine and return the result of running it. If awaiting the coroutine raises an 116 exception, the exception will be returned. 117 """ 118 try: --> 119 return await coro 120 except Exception as ex: 121 return ex File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:692, in AsyncGroup.getitem(self, key) 690 return self._getitem_consolidated(store_path, key, prefix=self.name) 691 try: --> 692 return await get_node( 693 store=store_path.store, path=store_path.path, zarr_format=self.metadata.zarr_format 694 ) 695 except FileNotFoundError as e: 696 raise KeyError(key) from e File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:3621, in get_node(store, path, zarr_format) 3619 match zarr_format: 3620 case 2: -> 3621 return await _get_node_v2(store=store, path=path) 3622 case 3: 3623 return await _get_node_v3(store=store, path=path) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:3576, in _get_node_v2(store, path) 3561 async def _get_node_v2(store: Store, path: str) -> AsyncArray[ArrayV2Metadata] | AsyncGroup: 3562 """ 3563 Read a Zarr v2 AsyncArray or AsyncGroup from a path in a Store. 3564 (...) 3574 AsyncArray | AsyncGroup 3575 """ -> 3576 metadata = await _read_metadata_v2(store=store, path=path) 3577 return _build_node(store=store, path=path, metadata=metadata) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:3468, in _read_metadata_v2(store, path) 3465 else: 3466 zmeta = json.loads(zgroup_bytes.to_bytes()) -> 3468 return _build_metadata_v2(zmeta, zattrs) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/group.py:3524, in _build_metadata_v2(zarr_json, attrs_json) 3522 match zarr_json: 3523 case {"shape": _}: -> 3524 return ArrayV2Metadata.from_dict(zarr_json | {"attributes": attrs_json}) 3525 case _: # pragma: no cover 3526 return GroupMetadata.from_dict(zarr_json | {"attributes": attrs_json}) File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/metadata/v2.py:163, in ArrayV2Metadata.from_dict(cls, data) 161 fill_value_encoded = _data.get("fill_value") 162 if fill_value_encoded is not None: --> 163 fill_value = dtype.from_json_scalar(fill_value_encoded, zarr_format=2) 164 _data["fill_value"] = fill_value 166 # zarr v2 allowed arbitrary keys here. 167 # We don't want the ArrayV2Metadata constructor to fail just because someone put an 168 # extra key in the metadata. File ~/Projects/Theis/anndata/venv/lib/python3.12/site-packages/zarr/core/dtype/npy/string.py:281, in VariableLengthUTF8.from_json_scalar(self, data, zarr_format) 277 """ 278 Strings pass through 279 """ 280 if not check_json_str(data): --> 281 raise TypeError(f"Invalid type: {data}. Expected a string.") 282 return data TypeError: Invalid type: 0. Expected a string.

Test data: adata.zarr.zip

this should now be fixed, could you try it again?

yes will point at main now @d-v-b thanks!

rabernat · 2025-06-09T15:24:26Z

src/zarr/core/dtype/__init__.py

+
+
+# TODO: find a better name for this function
+def get_data_type_from_native_dtype(dtype: npt.DTypeLike) -> ZDType[TBaseDType, TBaseScalar]:


FWIW, I did a bit of manual QC on this function and it worked exactly as I expected it to in every scenario. 🙌

Co-authored-by: Ryan Abernathey <[email protected]>

d-v-b · 2025-06-11T16:59:13Z

thanks @rabernat! I have a final cleanup in the works where I simplify the ZDType base class to make the from_* classmethods more type safe, and after that I will update and hit the merge button.

ngoldbaum · 2025-06-13T14:31:04Z

Hi! I'm a NumPy maintainer who has worked a bit on the new user DType system we released in NumPy 2.0.

@nenb let me know that this PR is a thing and it's getting merged soon.

One of the main things we (the NumPy developers) skipped before releasing user DTypes was adding a way to serialize user DTypes to npy files. That's still an open question.

I wonder if instead of working on improving the npy file format - which is simple on purpose and beholden to not breaking user parser implementations - we can just point people at zarr instead. Maybe even think about adding zarr-python as an optional NumPy dependency to handle more complicated serialization than what npy can handle.

This is all just me offering my personal opinion, not an opinion of anyone else involved with NumPy. Nick told me he's going on vacation for a while but he's interested in chatting more. I'm just commenting publicly because it's exciting to me 😃.

…guards for native dtype and json input

…into feat/fixed-length-strings

…zarr-python into feat/fixed-length-strings

…odec_id in a typeddict for zarr v2 metadata

d-v-b · 2025-06-15T21:28:23Z

Thanks for the interest @ngoldbaum! I think it would be super interesting to evaluate Zarr as a "standard" serialization scheme for NumPy arrays.

One of the main things we (the NumPy developers) skipped before releasing user DTypes was adding a way to serialize user DTypes to npy files. That's still an open question.

This is something Zarr has to address for every dtype, but we operate under some constraints that make things simpler for us: all dtypes have to serialize to JSON, and as of Zarr V3 the structure of the JSON form of a dtype is well-defined (either a string or a dict with restricted keys). But I imagine NumPy might reasonably not want to commit to JSON, which could complicate the serialization story a bit.

d-v-b · 2025-06-15T21:36:31Z

I did a final pass of type safety stuff, and I made a substantial change to how zarr v2 data types are serialized to and from JSON.

As a reminder, Zarr python 2 supported multiple distinct array data types with the same dtype: "|O" field in array metadata. When creating an array with dtype "|O", you had to specify the "object codec" (a codec that could serialize arbitrary python objects) explicitly. Thus, the object codec is effectively part of the Zarr V2 dtype model.

I have formalized this by making ZDType.to_json(zarr_format=2) return a dict with the form {"name": <thing you would see in .zarray under dtype>, "object_codec_id": <name of the object codec> | None}, and similarly ZDType.from_json(data, zarr_format=2) expects a dict with the same form. This keeps the ZDType API simple while capturing the essential structure of a Zarr v2 data type in a single piece of data.

Unfortunately, that piece of data doesn't look exactly like what you would see in .zarray, and for most data types the object_codec_id field is None and conveying very little information. If this feels clunky, that's an honest reflection of the quite clunky data type model used by Zarr V2 :).

This is an internal change that doesn't affect the basic design, so I don't think we need new reviews. Unless any new objections surface, I will merge this sometime tomorrow, and we can get started with the remaining tasks (namely, docs).

TomAugspurger · 2025-06-16T11:22:34Z

Nice work @d-v-b!

jhamman · 2025-06-16T15:17:57Z

Huge props to @d-v-b for pulling this across the finish line. This was a huge and highly-impactful lift. 🙌 🙌 🙌 🙌

For those that are following this thread, expect this to come out in Zarr 3.1.

@ngoldbaum - thanks for all your work on NumPy. We'd love open the conversation about supporting writing to Zarr directly from NumPy. Can you advise on where the best place to do that is?

ilan-gold · 2025-06-16T17:31:02Z

@d-v-b Not sure if it's a bug or not, but here is something interesting.

Run this using, say, zarr 3.0.8:

import zarr, numcodecs

z = zarr.open("foo.zarr", zarr_format=2)
z.create_array("bar", (10,), dtype=object, filters=[numcodecs.VLenUTF8()], fill_value="")
z["bar"][...]
# array(['', '', '', '', '', '', '', '', '', ''], dtype=object)

Then after this PR:

import zarr, numcodecs

z = zarr.open("foo.zarr")
z["bar"][...]
# array(['', '', '', '', '', '', '', '', '', ''], dtype=StringDType())

i.e., the dtype is now a string dtype. For us it was a simple fix, but not sure if it is (a) intentional or (b) indicative of some other issue.

d-v-b · 2025-06-16T17:36:41Z

@d-v-b Not sure if it's a bug or not, but here is something interesting.

Run this using, say, zarr 3.0.8:
import zarr, numcodecs

z = zarr.open("foo.zarr", zarr_format=2)
z.create_array("bar", (10,), dtype=object, filters=[numcodecs.VLenUTF8()], fill_value="")
z["bar"][...]
# array(['', '', '', '', '', '', '', '', '', ''], dtype=object)
Then after this PR:
import zarr, numcodecs

z = zarr.open("foo.zarr")
z["bar"][...]
# array(['', '', '', '', '', '', '', '', '', ''], dtype=StringDType())
i.e., the dtype is now a string dtype. For us it was a simple fix, but not sure if it is (a) intentional or (b) indicative of some other issue.

Numpy 2.0 introduced a new data type specifically for variable length strings. that's the StringDtype() you are seeing in the numpy array. I suspect that prior to my PR, zarr-python was not using StringDtype when reading variable-length strings in zarr v2. But it should have been, IMO, so I think the current behavior is an improvement.

ilan-gold · 2025-06-16T17:51:52Z

Agreed. The only reason it came up was that we were implicitly relying on the nullability of the object dtype in strings. So we'll transition now away from that, I think

d-v-b · 2025-06-16T18:01:39Z

I think the numpy string dtype has nullability semantics, that might be worth investigating

dstansby · 2025-06-17T07:33:49Z

Was this intentionally merged into main, and not the 3.1.0 branch?

d-v-b · 2025-06-17T07:40:38Z

Was this intentionally merged into main, and not the 3.1.0 branch?

Yes, but we can revisit that decision. I was under the impression that we were going to merge the 3.1 prs directly into main, but if we want to do a 3.0.9 release first, then we can revert this merge.

dstansby · 2025-06-17T07:59:21Z

👍 - I'll delete the 3.1.0 branch then, and create a 3.0.x branch from before when this was merged.

d-v-b added 9 commits February 21, 2025 13:43

modernize typing

f5e3f78

Merge branch 'main' of https://github.com/zarr-developers/zarr-python …

b4e71e2

…into feat/fixed-length-strings

lint

3c50f54

new dtypes

d74e7a4

rename base dtype, change type to kind

5000dcb

start working on JSON serialization

9cd5c51

get json de/serialization largely working, and start making tests pass

042fac1

tweak json type guards

556e390

fix dtype sizes, adjust fill value parsing in from_dict, fix tests

b588f70

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 28, 2025

d-v-b added 2 commits March 2, 2025 12:54

mid-refactor commit

4ed41c6

working form for dtype classes

1b2c773

d-v-b commented Mar 2, 2025

View reviewed changes

src/zarr/core/metadata/dtype.py Outdated Show resolved Hide resolved

d-v-b commented Mar 2, 2025

View reviewed changes

src/zarr/core/metadata/dtype.py Outdated Show resolved Hide resolved

d-v-b commented Mar 2, 2025

View reviewed changes

src/zarr/core/metadata/dtype.py Outdated Show resolved Hide resolved

d-v-b commented Mar 2, 2025

View reviewed changes

src/zarr/codecs/sharding.py Outdated Show resolved Hide resolved

d-v-b added 3 commits March 2, 2025 21:55

remove unused code

24930b3

use wrap / unwrap instead of to_dtype / from_dtype; push into v2 code…

703e0e1

…base

push into v2

3c232a4

remove endianness kwarg to methods, make it an instance variable instead

b7fe986

d-v-b mentioned this pull request Mar 4, 2025

support for datetime and timedelta dtypes (#2616) #2884

Draft

6 tasks

d-v-b added 4 commits March 4, 2025 18:10

make wrapping safe by default

d9b44b4

Merge branch 'main' of github.com:zarr-developers/zarr-python into fe…

bf24d69

…at/fixed-length-strings

dtype-specific tests

c1a8566

more tests, fix void type default value logic

2868994

d-v-b mentioned this pull request Mar 5, 2025

Fix fill_value serialization issues #2802

Merged

6 tasks

fix dtype mechanics in bytescodec

9ab0b1e

sjperkins mentioned this pull request Jun 9, 2025

Object types in dataset ratt-ru/xarray-ms#103

Open

rabernat approved these changes Jun 11, 2025

View reviewed changes

Update docs/user-guide/data_types.rst

63de7c4

Co-authored-by: Ryan Abernathey <[email protected]>

sjperkins mentioned this pull request Jun 12, 2025

Zarr Python v3 and Zarr v3 casangi/xradio#355

Open

d-v-b added 7 commits June 13, 2025 18:53

refactor wrapper to allow subclasses to freely define their own type …

b069d36

…guards for native dtype and json input

Merge branch 'main' of https://github.com/zarr-developers/zarr-python …

ae36dbf

…into feat/fixed-length-strings

Merge branch 'feat/fixed-length-strings' of https://github.com/d-v-b/…

a1f2c94

…zarr-python into feat/fixed-length-strings

make method definition order consistent

b2e56c8

allow structured scalars to be np.void

d26b695

use a common function signature for from_json by packing the object_c…

49f0062

…odec_id in a typeddict for zarr v2 metadata

fix dtype doc example

70da4da

Merge branch 'main' into feat/fixed-length-strings

16b4ac6

d-v-b merged commit 6798466 into zarr-developers:main Jun 16, 2025
30 checks passed

norlandrhagen mentioned this pull request Jun 16, 2025

Refactor codebase to support a new simplified Parser->ManifestStore model. zarr-developers/VirtualiZarr#601

Merged

7 tasks

ilan-gold mentioned this pull request Jun 17, 2025

(feat): new zarr dtypes scverse/anndata#1995

Draft

3 tasks

TomNicholas mentioned this pull request Jun 17, 2025

Zarr data types refactor compatibility zarr-developers/VirtualiZarr#618

Open

7 tasks

ngoldbaum mentioned this pull request Jun 18, 2025

ENH: Add a public API for generating hashable buffers numpy/numpy#29229

Open



		# TODO: find a better name for this function
		def get_data_type_from_native_dtype(dtype: npt.DTypeLike) -> ZDType[TBaseDType, TBaseScalar]:

Uh oh!

refactor v3 data types #2874

refactor v3 data types #2874

Uh oh!

Conversation

d-v-b commented Feb 28, 2025

Uh oh!

d-v-b commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nenb commented Mar 3, 2025

Uh oh!

d-v-b commented Mar 4, 2025

Uh oh!

rabernat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilan-gold Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Jun 11, 2025

Uh oh!

ngoldbaum commented Jun 13, 2025

Uh oh!

d-v-b commented Jun 15, 2025

Uh oh!

d-v-b commented Jun 15, 2025

Uh oh!

Uh oh!

TomAugspurger commented Jun 16, 2025

Uh oh!

jhamman commented Jun 16, 2025

Uh oh!

ilan-gold commented Jun 16, 2025

Uh oh!

d-v-b commented Jun 16, 2025

Uh oh!

ilan-gold commented Jun 16, 2025

Uh oh!

d-v-b commented Jun 16, 2025

Uh oh!

dstansby commented Jun 17, 2025

Uh oh!

d-v-b commented Jun 17, 2025

Uh oh!

dstansby commented Jun 17, 2025

Uh oh!

Uh oh!

ilan-gold Jun 11, 2025 •

edited

Loading

ilan-gold Jun 13, 2025 •

edited

Loading