Skip to content

slicing DataArray with RangeIndex coordinate can put coordinate in inconsistent state #10441

Open
@anntzer

Description

@anntzer

What happened?

Slicing a DataArray with RangeIndex coordinate can put that coordinate in an internally inconsistent state.

What did you expect to happen?

No internally inconsistent state.

Minimal Complete Verifiable Example

import numpy as np, xarray as xr, xarray.indexes

n = 30
step = 1
da = xr.DataArray(np.zeros(n), dims=["x"])
da = da.assign_coords(
    xr.Coordinates.from_xindex(
        xr.indexes.RangeIndex.linspace(0, (n - 1) * step, n, dim="x")))
sub = da.isel(x=slice(4, None, 3))

print(da)
print(da.shape, da.x.shape)  # both have shape (30,)
da.expand_dims({"y": [0]}, 0)  # ok

print()

print(sub)
print(sub.shape, sub.x.shape)  # sub has shape (9,) but sub.x has shape (8,)
sub.expand_dims({"y": [0]}, 0)  # crashes, likely due to internally inconsistent state

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

<xarray.DataArray (x: 30)> Size: 240B
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * x        (x) float64 240B 0.0 1.0 2.0 3.0 4.0 ... 25.0 26.0 27.0 28.0 29.0
Indexes:
    x        RangeIndex (start=0, stop=30, step=1)
(30,) (30,)

<xarray.DataArray (x: 9)> Size: 72B
array([0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * x        (x) float64 64B 4.0 7.0 10.0 13.0 16.0 19.0 22.0 25.0
Indexes:
    x        RangeIndex (start=4, stop=28, step=3)
(9,) (8,)
Traceback (most recent call last):
  File "/private/tmp/test.py", line 19, in <module>
    sub.expand_dims({"y": [0]}, 0)  # crashes
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 2707, in expand_dims
    ds = self._to_temp_dataset().expand_dims(
         ~~~~~~~~~~~~~~~~~~~~~^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 581, in _to_temp_dataset
    return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataarray.py", line 648, in _to_dataset_whole
    return Dataset._construct_direct(variables, coord_names, indexes=indexes)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/python3.13/site-packages/xarray/core/dataset.py", line 776, in _construct_direct
    dims = calculate_dimensions(variables)
  File "/path/to/python3.13/site-packages/xarray/core/variable.py", line 3044, in calculate_dimensions
    raise ValueError(
    ...<2 lines>...
    )
ValueError: conflicting sizes for dimension 'x': length 9 on <this-array> and length 8 on {'x': 'x'}

Anything else we need to know?

From a quick look, the bug occurs at

new_size = (sl.stop - sl.start) // sl.step
where the formula is wrong. The correct formula can be e.g. copied from https://github.com/python/cpython/blob/569fc6870f048cb75469ae3cacb6ebcf5172a10e/Objects/rangeobject.c#L950-L976
the patch for xarray being

                      ^^^^^^^^^^^^^
diff --git i/xarray/indexes/range_index.py w/xarray/indexes/range_index.py
index 2b9a5e50..04459e79 100644
--- i/xarray/indexes/range_index.py
+++ w/xarray/indexes/range_index.py
@@ -84,7 +84,7 @@ class RangeCoordinateTransform(CoordinateTransform):
         # TODO: support reverse transform (i.e., start > stop)?
         assert sl.start < sl.stop

-        new_size = (sl.stop - sl.start) // sl.step
+        new_size = (sl.stop - sl.start - 1) // sl.step + 1
         new_start = self.start + sl.start * self.step
         new_stop = new_start + new_size * sl.step * self.step

which appears to fix the problem for me.

Environment

INSTALLED VERSIONS

commit: None
python: 3.13.0 | packaged by conda-forge | (main, Oct 17 2024, 12:32:35) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 24.5.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: None

xarray: 2025.6.1
pandas: 2.3.0
numpy: 2.3.1
scipy: 1.16.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.13.0
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.10.0
distributed: None
matplotlib: 3.11.0.dev985+gbebb26384f
cartopy: None
seaborn: 0.13.2
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: 0.24.4
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.3.1
conda: None
pytest: 8.3.5
mypy: 1.15.0
IPython: 9.3.0
sphinx: 8.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions