Method to fill holes for categorical data

**Is your feature request related to a problem? Please describe.**

We have 3D segmentation masks where every class has its own label / ID.
For every class we would like to fill holes in the segmentation.

For an example, the following matrix:

```python
[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 0, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 0, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]
```

Should result in

```python
[
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 0, 0 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 0, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
  [
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 1, 1, 1, 2, 2, 2 ],
    [ 0, 3, 0, 0, 4, 0 ],
    [ 3, 3, 3, 4, 4, 4 ],
    [ 0, 3, 0, 0, 4, 0 ],
  ],
]
```

The only filled holes are the 1 and 3 in the middle slice.
The 2 shape is open to the side and the 4 is open to the back.
The 0 between the classes should stay untouched.

**Describe the solution you'd like**

scipy already implements the method `binary_fill_holes` which fills holes for binary images. I wish that scipy would also provide a method for categorical data (non binary).

**Describe alternatives you've considered**

I implemented 7 versions using the existing [scipy.ndimage.morphology.binary_fill_holes](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.binary_fill_holes.html?highlight=fill_holes#scipy.ndimage.binary_fill_holes) function (or its implementation) and `numpy`. Here the two best versions so far:

```python
import numpy as np
from scipy.ndimage.morphology import binary_fill_holes

def fill_holes6(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros_like(img)
    for i in applied_labels:
        output[binary_fill_holes(img == i)] = i

    return output

def fill_holes7(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
    output = np.zeros(img.shape, dtype=int)
    for i in applied_labels:
        tmp = np.zeros(img.shape, dtype=bool)
        binary_dilation(tmp, structure=None, iterations=-1, mask=img != i, origin=0, border_value=1, output=tmp)
        output[np.logical_not(tmp)] = i
        
    return output
```

I measured the performance the following way (matching my real world data distribution):

```python
import time
import pandas as pd

def measure(funs, t):
    res = []
    for _ in range(t):
        ra = np.random.randint(10, 40)
        sh = np.random.randint(200, 400, 3)
        img = np.random.randint(0, ra, sh)

        applied_labels = np.unique(img)[1:]

        fun_res = []
        for fun in funs:
            start = time.time()
            fun(img, applied_labels)
            end = time.time()
            fun_res.append(end - start)
        res.append(fun_res)
    return np.min(res, axis=0), np.max(res, axis=0), np.mean(res, axis=0), np.std(res, axis=0)

print(measure([fill_holes6, fill_holes7], t=10))
```

For my first implementations I got the following execution times (t=100):

|  | fill_holes1 | fill_holes2 | fill_holes3 |
| -- | ------------- | ------------- | ------------- |
| min  | 6.4s | 6.9s | 6.2s |
| max  | 83.7s | 96.0s | 80.4s  |
| mean  | 32.9s | 37.3s | 31.6s  |
| std  | 17.3s | 20.1s | 16.5 |

This is very slow.
The last implementation fill_holes7 is only 1.27 times faster than fill_holes3.
I really hope there is a more performant way of doing this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Method to fill holes for categorical data #14504

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	fill_holes1	fill_holes2	fill_holes3
min	6.4s	6.9s	6.2s
max	83.7s	96.0s	80.4s
mean	32.9s	37.3s	31.6s
std	17.3s	20.1s	16.5

Uh oh!

Method to fill holes for categorical data #14504

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions