Skip to content
/ HCL Public

Official PyTorch code for "Hierarchical Context Learning of object components for unsupervised semantic segmentation" (Pattern Recognition 2025)

License

Notifications You must be signed in to change notification settings

dbaofd/HCL

Repository files navigation

Hierarchical Context Learning of Object Components for Unsupervised Semantic Segmentation

Dong Bao, Jun Zhou, Gervase Tuxworth, Jue Zhang, Yongsheng Gao

[Paper]

PWC PWC PWC

profile

📖 Contents

⚡ Requirements

Install the following packages.

- python >= 3.10
- pytorch >= 2.0
- faiss-gpu >= 1.7.4
- torchvision >= 0.15.2
- torchmetrics >= 1.4.0
- opencv >= 4.6.0
- pydensecrf = 1.0rc3
- scikit-learn >= 1.1.3
- scikit-image >= 0.21.0
- einops >= 0.3.2

🐶 Datasets

Please download the data and follow the structure below.

COCO-Stuff

dataset root.
└───stuffthingmaps_trainval2017
│   └───train2017
│       │   *.png
│       │   ...
│   └───val2017
│       │   *.png
│       │   ...
└───train2017
│   │   *.jpg
│   │   ...
└───val2017
│   │   *.jpg
│   │   ...
└───Coco164kFull_Stuff_Coarse.txt
└───Coco164kFull_Stuff_Coarse_7.txt
└───cocostuff10k.txt

Pascal VOC

dataset root.
└───SegmentationClass
│   │   *.png
│   │   ...
└───SegmentationClassAug # contains segmentation masks from trainaug extension 
│   │   *.png
│   │   ...
└───JPEGImages
│   │   *.jpg
│   │   ...
└───ImageSets
|   └───Segmentation
│       │   train.txt
│       │   trainaug.txt
│       │   val.txt

🍞 Checkpoints

We release the weights on trained HCL. The backbone of HCL is PM-ViT, which is fixed during the model training. For PM-ViT-S/16, we load Dino-pretrained ViT weights "dino_deitsmall16_pretrain.pth" (you can download either from the link in the table below or through the Dino git repo). For PM-ViT-S/8, we load Dino-pretrained ViT weights "dino_deitsmall8_pretrain.pth". Seghead and linear classifier weights are provided.

Dataset Backbone Pretrained ViT Seghead Linear Classifier
PVOC PM-ViT-S/16 link link link
PVOC PM-ViT-S/8 link link link
COCO-Stuff PM-ViT-S/16 link link
COCO-Stuff PM-ViT-S/8 link link

Create a folder "weights" in the root folder with following structure:

weights
|── linear_classifier_weights
|── pretrain
└── seghead_weights

Then download these check points. Put Dino-pretrained weights to "pretrain" folder, put Seghead weights to "seghead_weights" folder, and put linear classifier weights to "linear_classifier_weights" folder.

🏃 Training

To train HCL, go to main_hcl.py, change the corresponding hyperparameters. Then please run:

python main_hcl.py --epochs 10 --batch-size 64 --dist-url 'tcp://0.0.0.0:10001' --multiprocessing-distributed --world-size 1 --rank 0

🐨 Evaluation

Linear Classifier Evaluation

To evaluate linear classifier, go to "linear_eval.py", select a configuration from "eval_config", and then modify "selected_config". In the end, please run:

python linear_eval.py --batch-size 16 --gpu 0

Overclustering Evaluation

To evaluate overclustering performance, go to "overclustering_eval.py", select a configuration from "eval_config", and then modify "selected_config". In the end, please run:

python overclustering_eval.py --batch-size 16 --gpu 0

Understanding HCL

PM-ViT

Parallel Multi-level Vision Transformer (PM-ViT), a specially designed backbone that captures multi-level object granularities and aggregates hierarchical contextual information into unified object component tokens. pm-vit

HCL Architecture

Hierarchical Context Learning (HCL) of object components for USS, which focuses on learning discriminative spatial token embeddings by enhancing semantic consistency through hierarchical context. At the core of HCL is PM-ViT, a specially designed backbone that integrates multi-level hierarchical contextual information into unified token representations. To uncover the intrinsic semantic structures of objects, we introduce Momentum-based Global Foreground-Background Clustering (MoGoClustering). Leveraging DINO’s foreground extraction capability, MoGoClustering clusters foreground and background object components into coherent semantic groups. It initializes cluster centroids and iteratively refines them during the optimization process to achieve robust semantic grouping. Furthermore, coupled with a dense prediction loss, we design a Foreground-Background-Aware (FBA) contrastive loss based on MoGoClustering to ensure that the learned dense representations are compact and consistent across views. arch

Evaluation Results

We evaluate the HCL on the PVOC and COCO-Stuff datasets. arch

Unsupervised Foreground Extraction

We compare the unsupervised foreground extraction results (in green) of HCL with DINO cls token attention maps (in red). arch

Learned Object Component Visualization

Object component representation visualization on the PVOC dataset using PM-ViT-S/16. The locations with a cross on the image are the query tokens, e.g., there is a cross on the bus wheel in the top left image. The query token is assigned a cluster ID from Cfg or Cbg, then other tokens with the same cluster ID from other images are visualized and presented on the right side of the query images. There are eight query tokens with different cluster IDs included: 1) left 1: bus wheel; 2) left 2: car glass; 3) left 3: car wheel; 4) left 4: human upper face; 5) right 1: human mouth and jaw; 6) right 2: human hand; 7) right 3: cat ear; 8) right 4: dog nose and mouth. arch

Citation

If you believe this project is useful, please consider starring or citing, cheers.

@article{bao2025hierarchical,
  title={Hierarchical Context Learning of object components for unsupervised semantic segmentation},
  author={Bao, Dong and Zhou, Jun and Tuxworth, Gervase and Zhang, Jue and Gao, Yongsheng},
  journal={Pattern Recognition},
  pages={111713},
  year={2025},
  publisher={Elsevier}
}

About

Official PyTorch code for "Hierarchical Context Learning of object components for unsupervised semantic segmentation" (Pattern Recognition 2025)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages