Skip to content

(fix): ndim accessible as np.ndim on PandasExtensionArray #10414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 11, 2025

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Jun 11, 2025

Reproducer for the underlying bug:

import pandas as pd
from xarray.core.extension_array import PandasExtensionArray
from xarray.core.indexing import LazilyIndexedArray

cat = PandasExtensionArray(pd.Categorical(["a", "b"] * 5))
lazy = LazilyIndexedArray(cat)
assert (lazy.get_duck_array().array == cat).all()
venv_13/lib/python3.13/site-packages/legacy_api_wrap/__init__.py:82: in fn_compatible
    return fn(*args_all, **kw)
src/anndata/_core/anndata.py:1433: in to_memory
    new[attr_name] = to_memory(attr, copy=copy)
/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/functools.py:929: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
src/anndata/_core/file_backing.py:168: in _
    return x.to_memory(copy=copy)
src/anndata/_core/xarray.py:203: in to_memory
    df = self.ds.to_dataframe()
venv_13/lib/python3.13/site-packages/xarray/core/dataset.py:7164: in to_dataframe
    return self._to_dataframe(ordered_dims=ordered_dims)
venv_13/lib/python3.13/site-packages/xarray/core/dataset.py:7086: in _to_dataframe
    if not is_extension_array_dtype(self.variables[k].data)
venv_13/lib/python3.13/site-packages/xarray/core/variable.py:416: in data
    duck_array = self._data.get_duck_array()
venv_13/lib/python3.13/site-packages/xarray/core/indexing.py:662: in get_duck_array
    return _wrap_numpy_scalars(array)
venv_13/lib/python3.13/site-packages/xarray/core/indexing.py:772: in _wrap_numpy_scalars
    if np.ndim(array) == 0 and (
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = PandasExtensionArray(array=['k', 'v', 'l', 'J', 't', ..., 'H', 'u', 'd', 'b', 'h']
Length: 1000
Categories (52, object): ['A', 'B', 'C', 'D', ..., 'w', 'x', 'y', 'z']), func = <function ndim at 0x101dfbe30>, types = (<class 'xarray.core.extension_array.PandasExtensionArray'>,)
args = (['k', 'v', 'l', 'J', 't', ..., 'H', 'u', 'd', 'b', 'h']
Length: 1000
Categories (52, object): ['A', 'B', 'C', 'D', ..., 'w', 'x', 'y', 'z'],), kwargs = {}

    def __array_function__(self, func, types, args, kwargs):
        def replace_duck_with_extension_array(args) -> list:
            args_as_list = list(args)
            for index, value in enumerate(args_as_list):
                if isinstance(value, PandasExtensionArray):
                    args_as_list[index] = value.array
                elif isinstance(
                    value, tuple
                ):  # should handle more than just tuple? iterable?
                    args_as_list[index] = tuple(
                        replace_duck_with_extension_array(value)
                    )
                elif isinstance(value, list):
                    args_as_list[index] = replace_duck_with_extension_array(value)
            return args_as_list
    
        args = tuple(replace_duck_with_extension_array(args))
        if func not in HANDLED_EXTENSION_ARRAY_FUNCTIONS:
>           raise KeyError("Function not registered for pandas extension arrays.")
E           KeyError: 'Function not registered for pandas extension arrays.'

Since #8821 we need ndim accessible via np.ndim and since https://github.com/pydata/xarray/pull/10317/files#diff-c803294f5216cbbdffa30f0b0c9f16a7e39855d4dd309c88d654bc317a78adc0L103-L104 we are not randomly falling back on whatever pandas offers. Hence this fix to explicitly add ndim to the numpy-registered functions.

@ilan-gold ilan-gold changed the title (fix): ndim accessible as np.ndim from PandasExtensionArray (fix): ndim accessible as np.ndim on PandasExtensionArray Jun 11, 2025
@ilan-gold
Copy link
Contributor Author

Test failures looking unrelated

@dcherian
Copy link
Contributor

How about we add ndim to duck_array_ops.py, try array.ndim first and then fall back to np.ndim to handle the scalar case?

@github-actions github-actions bot added topic-indexing topic-arrays related to flexible array support labels Jun 11, 2025
@ilan-gold
Copy link
Contributor Author

@dcherian Let me know if this is what you had in mind

@dcherian dcherian enabled auto-merge (squash) June 11, 2025 18:50
@dcherian dcherian merged commit f95bd2d into pydata:main Jun 11, 2025
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants