Description
Is your feature request related to a problem? Please describe.
We have 3D segmentation masks where every class has its own label / ID.
For every class we would like to fill holes in the segmentation.
For an example, the following matrix:
[
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 3, 3, 4, 0, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 0, 1, 2, 0, 0 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 0, 3, 4, 0, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 3, 3, 4, 4, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
]
Should result in
[
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 3, 3, 4, 0, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 0, 0 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 3, 3, 4, 0, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
[
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 1, 1, 1, 2, 2, 2 ],
[ 0, 3, 0, 0, 4, 0 ],
[ 3, 3, 3, 4, 4, 4 ],
[ 0, 3, 0, 0, 4, 0 ],
],
]
The only filled holes are the 1 and 3 in the middle slice.
The 2 shape is open to the side and the 4 is open to the back.
The 0 between the classes should stay untouched.
Describe the solution you'd like
scipy already implements the method binary_fill_holes
which fills holes for binary images. I wish that scipy would also provide a method for categorical data (non binary).
Describe alternatives you've considered
I implemented 7 versions using the existing scipy.ndimage.morphology.binary_fill_holes function (or its implementation) and numpy
. Here the two best versions so far:
import numpy as np
from scipy.ndimage.morphology import binary_fill_holes
def fill_holes6(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
output = np.zeros_like(img)
for i in applied_labels:
output[binary_fill_holes(img == i)] = i
return output
def fill_holes7(img: np.ndarray, applied_labels: np.ndarray) -> np.ndarray:
output = np.zeros(img.shape, dtype=int)
for i in applied_labels:
tmp = np.zeros(img.shape, dtype=bool)
binary_dilation(tmp, structure=None, iterations=-1, mask=img != i, origin=0, border_value=1, output=tmp)
output[np.logical_not(tmp)] = i
return output
I measured the performance the following way (matching my real world data distribution):
import time
import pandas as pd
def measure(funs, t):
res = []
for _ in range(t):
ra = np.random.randint(10, 40)
sh = np.random.randint(200, 400, 3)
img = np.random.randint(0, ra, sh)
applied_labels = np.unique(img)[1:]
fun_res = []
for fun in funs:
start = time.time()
fun(img, applied_labels)
end = time.time()
fun_res.append(end - start)
res.append(fun_res)
return np.min(res, axis=0), np.max(res, axis=0), np.mean(res, axis=0), np.std(res, axis=0)
print(measure([fill_holes6, fill_holes7], t=10))
For my first implementations I got the following execution times (t=100):
fill_holes1 | fill_holes2 | fill_holes3 | |
---|---|---|---|
min | 6.4s | 6.9s | 6.2s |
max | 83.7s | 96.0s | 80.4s |
mean | 32.9s | 37.3s | 31.6s |
std | 17.3s | 20.1s | 16.5 |
This is very slow.
The last implementation fill_holes7 is only 1.27 times faster than fill_holes3.
I really hope there is a more performant way of doing this.