Skip to content

High memory usage on proximity notebook #79

Closed
@TomAugspurger

Description

@TomAugspurger

The cell

extent_data = data.sel(band="extent")

extent_proximity_default = proximity(extent_data).compute()

is current failing on staging because the workers are using too much memory. The notebook output has

distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.worker - ERROR - failed during get data with tcp://127.0.0.1:39389 -> tcp://127.0.0.1:36541
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.8/site-packages/distributed/comm/tcp.py", line 198, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed

Here's a reproducer with just xrspatial, dask, and xarray

import dask.array as da
import xarray as xr
from xrspatial.proximity import proximity

a = xr.DataArray(da.ones((5405, 5766), dtype="float64", chunks=(3000, 3000)), dims=("y", "x"))

xrspatial.proximity(a).compute()

cc @thuydotm, does this look like an issue in xrspatial? Or do you think it might be upstream in dask?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions