Hierarchical Context Learning of Object Components for Unsupervised Semantic Segmentation

Dong Bao, Jun Zhou, Gervase Tuxworth, Jue Zhang, Yongsheng Gao

📖 Contents

Requirements
Datasets
- COCO-Stuff
- Pascal VOC
Checkpoints
Training
Evaluation
- Linear Classifier Evaluation
- Overclustering Evaluation
Understanding HCL
Citation

⚡ Requirements

Install the following packages.

- python >= 3.10
- pytorch >= 2.0
- faiss-gpu >= 1.7.4
- torchvision >= 0.15.2
- torchmetrics >= 1.4.0
- opencv >= 4.6.0
- pydensecrf = 1.0rc3
- scikit-learn >= 1.1.3
- scikit-image >= 0.21.0
- einops >= 0.3.2

🐶 Datasets

Please download the data and follow the structure below.

COCO-Stuff

dataset root.
└───stuffthingmaps_trainval2017
│   └───train2017
│       │   *.png
│       │   ...
│   └───val2017
│       │   *.png
│       │   ...
└───train2017
│   │   *.jpg
│   │   ...
└───val2017
│   │   *.jpg
│   │   ...
└───Coco164kFull_Stuff_Coarse.txt
└───Coco164kFull_Stuff_Coarse_7.txt
└───cocostuff10k.txt

Pascal VOC

dataset root.
└───SegmentationClass
│   │   *.png
│   │   ...
└───SegmentationClassAug # contains segmentation masks from trainaug extension 
│   │   *.png
│   │   ...
└───JPEGImages
│   │   *.jpg
│   │   ...
└───ImageSets
|   └───Segmentation
│       │   train.txt
│       │   trainaug.txt
│       │   val.txt

🍞 Checkpoints

We release the weights on trained HCL. The backbone of HCL is PM-ViT, which is fixed during the model training. For PM-ViT-S/16, we load Dino-pretrained ViT weights "dino_deitsmall16_pretrain.pth" (you can download either from the link in the table below or through the Dino git repo). For PM-ViT-S/8, we load Dino-pretrained ViT weights "dino_deitsmall8_pretrain.pth". Seghead and linear classifier weights are provided.

Dataset	Backbone	Pretrained ViT	Seghead	Linear Classifier
PVOC	PM-ViT-S/16	link	link	link
PVOC	PM-ViT-S/8	link	link	link
COCO-Stuff	PM-ViT-S/16		link	link
COCO-Stuff	PM-ViT-S/8		link	link

Create a folder "weights" in the root folder with following structure:

weights
|── linear_classifier_weights
|── pretrain
└── seghead_weights

Then download these check points. Put Dino-pretrained weights to "pretrain" folder, put Seghead weights to "seghead_weights" folder, and put linear classifier weights to "linear_classifier_weights" folder.

🏃 Training

To train HCL, go to main_hcl.py, change the corresponding hyperparameters. Then please run:

python main_hcl.py --epochs 10 --batch-size 64 --dist-url 'tcp://0.0.0.0:10001' --multiprocessing-distributed --world-size 1 --rank 0

🐨 Evaluation

Linear Classifier Evaluation

To evaluate linear classifier, go to "linear_eval.py", select a configuration from "eval_config", and then modify "selected_config". In the end, please run:

python linear_eval.py --batch-size 16 --gpu 0

Overclustering Evaluation

To evaluate overclustering performance, go to "overclustering_eval.py", select a configuration from "eval_config", and then modify "selected_config". In the end, please run:

python overclustering_eval.py --batch-size 16 --gpu 0

Understanding HCL

PM-ViT

Parallel Multi-level Vision Transformer (PM-ViT), a specially designed backbone that captures multi-level object granularities and aggregates hierarchical contextual information into unified object component tokens.

HCL Architecture

Hierarchical Context Learning (HCL) of object components for USS, which focuses on learning discriminative spatial token embeddings by enhancing semantic consistency through hierarchical context. At the core of HCL is PM-ViT, a specially designed backbone that integrates multi-level hierarchical contextual information into unified token representations. To uncover the intrinsic semantic structures of objects, we introduce Momentum-based Global Foreground-Background Clustering (MoGoClustering). Leveraging DINO’s foreground extraction capability, MoGoClustering clusters foreground and background object components into coherent semantic groups. It initializes cluster centroids and iteratively refines them during the optimization process to achieve robust semantic grouping. Furthermore, coupled with a dense prediction loss, we design a Foreground-Background-Aware (FBA) contrastive loss based on MoGoClustering to ensure that the learned dense representations are compact and consistent across views.

Evaluation Results

We evaluate the HCL on the PVOC and COCO-Stuff datasets.

Unsupervised Foreground Extraction

We compare the unsupervised foreground extraction results (in green) of HCL with DINO cls token attention maps (in red).

Learned Object Component Visualization

Object component representation visualization on the PVOC dataset using PM-ViT-S/16. The locations with a cross on the image are the query tokens, e.g., there is a cross on the bus wheel in the top left image. The query token is assigned a cluster ID from Cfg or Cbg, then other tokens with the same cluster ID from other images are visualized and presented on the right side of the query images. There are eight query tokens with different cluster IDs included: 1) left 1: bus wheel; 2) left 2: car glass; 3) left 3: car wheel; 4) left 4: human upper face; 5) right 1: human mouth and jaw; 6) right 2: human hand; 7) right 3: cat ear; 8) right 4: dog nose and mouth.

Citation

If you believe this project is useful, please consider starring or citing, cheers.

@article{bao2025hierarchical,
  title={Hierarchical Context Learning of object components for unsupervised semantic segmentation},
  author={Bao, Dong and Zhou, Jun and Tuxworth, Gervase and Zhang, Jue and Gao, Yongsheng},
  journal={Pattern Recognition},
  pages={111713},
  year={2025},
  publisher={Elsevier}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
imgs		imgs
utils		utils
vit		vit
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
build.py		build.py
leopart_fully_unsup_seg.py		leopart_fully_unsup_seg.py
linear_eval.py		linear_eval.py
linear_finetune.py		linear_finetune.py
main_hcl.py		main_hcl.py
overclustering_eval.py		overclustering_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hierarchical Context Learning of Object Components for Unsupervised Semantic Segmentation

📖 Contents

⚡ Requirements

🐶 Datasets

COCO-Stuff

Pascal VOC

🍞 Checkpoints

🏃 Training

🐨 Evaluation

Linear Classifier Evaluation

Overclustering Evaluation

Understanding HCL

PM-ViT

HCL Architecture

Evaluation Results

Unsupervised Foreground Extraction

Learned Object Component Visualization

Citation

About

Uh oh!

Releases

Packages

Languages

License

dbaofd/HCL

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Context Learning of Object Components for Unsupervised Semantic Segmentation

📖 Contents

⚡ Requirements

🐶 Datasets

COCO-Stuff

Pascal VOC

🍞 Checkpoints

🏃 Training

🐨 Evaluation

Linear Classifier Evaluation

Overclustering Evaluation

Understanding HCL

PM-ViT

HCL Architecture

Evaluation Results

Unsupervised Foreground Extraction

Learned Object Component Visualization

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages