Skip to content

polygonize function on cpu for numpy-backed xarray DataArrays #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 15, 2021

Conversation

ianthomas23
Copy link
Contributor

@ianthomas23 ianthomas23 commented Nov 25, 2021

This PR adds a cpu-based polygonize function for numpy-backed DataArrays.

The algorithm is slightly different to GDAL's in that it is simpler and has fewer memory reallocations, at the cost of requiring more RAM. It is placed in a new xrspatial/experimental directory as the API has not been finalised and in particular there are some optional dependencies (e.g, geopandas) that we don't want to be required dependencies.

The API follows that of rasterio.features.shapes but there are 4 possible return types that are specified using the return_type kwarg. Possibilities are:

  • numpy (the default) - return polygons as numpy arrays, which is the fastest option.
  • spatialpandas - return a spatialpandas.GeoDataFrame.
  • geopandas - return a geopandas.GeoDataFrame. This option is only available if geopandas and shapely optional dependencies have been installed.
  • awkward - return polygons as an awkward array (https://awkward-array.readthedocs.io/en/latest/index.html). This option is only available if awkward is installed.

Here is some example code to run the same raster and mask arrays through this algorithm and that of rasterio/GDAL, and display the results side-by-side using Matplotlib.

import matplotlib.path as mpath
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import rasterio.features
from shapely.geometry import shape
import xarray as xr
from xrspatial.experimental import polygonize

shape = (30, 40)
rng = np.random.default_rng(7423)
raster = rng.integers(low=0, high=3, size=shape, dtype=np.int32)

mask = rng.uniform(0, 1, size=shape) < 0.8

# xarray-spatial polygonize.
xrsp_values, xrsp_polygons = polygonize(xr.DataArray(raster), mask=xr.DataArray(mask))

# rasterio/gdal polygonize.
rasterio_values = []
rasterio_polygons = []
for shape, z in rasterio.features.shapes(raster, mask=mask):
    rasterio_values.append(z)
    rasterio_polygons.append(shape['coordinates'])

def polygon_to_mpl_path(poly, is_rasterio):
    npts = sum([len(boundary) for boundary in poly])
    verts = np.empty((npts, 2))
    codes = np.full(npts, 2)  # LINETO
    i = 0
    for j, boundary in enumerate(poly):
        n = len(boundary)
        # rasterio/gdal boundaries do not always have correct orientation.
        if is_rasterio and j == 0:
            verts[i:i+n] = boundary[::-1]
        else:
            verts[i:i+n] = boundary
        codes[i] = 1  # MOVETO
        i += n
    codes[i-1] = 79  # CLOSEPOLY
    return mpath.Path(verts, codes)

# Map of value -> color index to color by value.
color_map = {}
for i, val in enumerate(np.unique(xrsp_values)):
    color_map[val] = i

fig, axes = plt.subplots(ncols=2, figsize=(12, 6), sharex=True, sharey=True)
for j, (ax, values, polygons, title) in enumerate(zip(
    axes, [xrsp_values, rasterio_values], [xrsp_polygons, rasterio_polygons],
    ['xarray-spatial', 'rasterio'])):

    for value, polygon in zip(values, polygons):
        i = color_map[value]
        c = f"C{i}"
        path = polygon_to_mpl_path(polygon, j==1)
        patch = mpatches.PathPatch(path, facecolor=c, edgecolor=c, alpha=0.5)
        ax.add_patch(patch)

    ax.set_title(title)
    ax.autoscale_view(tight=True)

plt.show()

Example output:
ex

Closes #471

@ianthomas23
Copy link
Contributor Author

I have made spatialpandas an optional dependency. It was listed as required in setup.py but not in requirements.txt which is what CI uses. It is only used in one example as well as polygonize. I have created issue #587 to deal with the duplication of requirements.

Now the only return_type of polygonize that is always available is numpy. The other 3 options require optional dependencies.

@ianthomas23
Copy link
Contributor Author

Note that this PR includes a function to calculate the unique regions of a raster, i.e. the area connectivity. There is already a function that does this in zonal.py, but the algorithm here also considers an optional mask. I expect that we will combine the two functions into one in the future.

@ianthomas23
Copy link
Contributor Author

Note to self: In the example code above the rendering of rasterio polygons isn't always correct as the winding orientation is not consistent. So if we ever need to plot rasterio output for comparison this needs to be improved.

Copy link
Contributor

@thuydotm thuydotm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments, otherwise look good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port polygonize to Numba
2 participants