sparsenet-panda

A course project for THU machine learning 2025

https://github.com/lx88882222/sparsenet-panda

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

A course project for THU machine learning 2025

Basic Info

Host: GitHub
Owner: lx88882222
License: apache-2.0
Language: Python
Default Branch: main
Size: 152 MB

Statistics

Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

SparseNet: Exploring the Trade-off between Convolution and Attention in Gigapixel Object Detection

This repository contains the official implementation for the final project of the Machine Learning course (Spring 2025) at Tsinghua University. This project investigates the architectural trade-offs for efficient and accurate object detection on the gigapixel-level, highly sparse PANDA dataset.

This work is built upon two excellent open-source projects: * SparseFormer (liwenxi/SparseFormer) * LSNet (THU-MIG/lsnet)

Our code starts from the SparseFormer baseline and innovatively integrates the highly efficient LS-Block from LSNet, leading to a valuable discovery about designing networks for this unique visual challenge.

🚀 Core Idea & Story

The central theme of this project is not just to achieve high performance, but to understand why certain architectural choices succeed or fail in the extreme environment of gigapixel object detection.

Our story unfolds in three acts:

Act I: Building a Strong ConvNet Baseline (SparseNet): We first question whether the complex self-attention mechanism in SparseFormer is truly necessary. We replace its core attention blocks with standard convolutional residual blocks, creating our baseline, SparseNet. This model surprisingly achieves a strong 0.70 AP50, proving that a pure ConvNet is a viable contender.
Act II: The Quest for Ultimate Efficiency (SparseNet-LS): Inspired by the "See Large, Focus Small" principle of LSNet, we hypothesize that we can significantly reduce GFLOPs by swapping our local convolution blocks with the hyper-efficient LS-Blocks. This leads to the creation of the SparseNet-LS variant.
Act III: An Insightful Discovery: The experiment yields a fascinating result. While SparseNet-LS successfully lowers GFLOPs, its AP50 drops to 0.58. This is not a failure, but a key insight: for the sparse and high-variance nature of the PANDA dataset, the dynamic, adaptive modeling capability of self-attention (or a sufficiently complex ConvNet block) is more critical than the sheer computational efficiency of generalized lightweight modules like the LS-Block.

🛠️ Installation

Clone the repository: bash git clone https://github.com/lx88882222/SparseNet-PANDA.git cd SparseNet-PANDA
Create and activate a Conda environment: bash conda create -n sparsenet python=3.8 -y conda activate sparsenet
Install dependencies: This project is built upon MMDetection. Please install the necessary dependencies using the provided requirements.txt and by following the official MMDetection installation guide. ```bash pip install -r requirements.txt

You might need to install PyTorch and MMCV manually to match your CUDA version

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install -U openmim

mim install mmcv-full

mim install mmdet

```

🔬 Usage

Data Preparation

Please download the PANDA dataset from the official website and structure it as required by MMDetection.

Inference with Our Best Model

You can easily reproduce the results of our best-performing model (SparseNet). The champion model weights and its corresponding configuration file are located in the /checkpoints directory.

To run distributed testing on N GPUs: ```bash

Make sure you are in the project root directory

./tools/disttest.sh \ checkpoints/config.py \ checkpoints/bestmodel.pth \ N ```

Training From Scratch

To train the models yourself, you can use the following commands for multi-GPU training.

Train SparseNet (Our Baseline): ```bash

This command trains the baseline model that achieved 0.70 AP50

Replace N with the number of GPUs you want to use.

./tools/disttrain.sh \ [PATHTOYOURSPARSENETCONFIG] \ N \ --work-dir ./workdirs/mysparsenetexperiment ```
Train SparseNet-LS (Our Experiment): ```bash

This command trains the experimental model with LS-Blocks

Replace N with the number of GPUs you want to use.

./tools/disttrain.sh \ [PATHTOYOURSPARSENETLSCONFIG] \ N \ --work-dir ./workdirs/mysparsenetlsexperiment ```

📈 Key Results

Our core findings are summarized in the table below, highlighting the trade-off between accuracy and efficiency.

| Model | Core Local Module | AP50 | GFLOPs (Relative) | Key Takeaway | | --------------- | -------------------------- |:----:|:-----------------:|:-------------------------------------------| | SparseNet | Standard Conv Residual Block | 0.70 | High | Proves ConvNets are strong for this task. | | SparseNet-LS| LS-Block from LSNet | 0.58 | Low | Shows that efficiency alone is not enough. |

🎓 Conclusion & Contribution

This project provides a comprehensive study on vision architectures for gigapixel detection. Our main contributions are: 1. We built and validated a strong pure-convolutional baseline, SparseNet, demonstrating its effectiveness. 2. We conducted a novel experiment by integrating the LS-Block into our baseline, quantitatively revealing a critical accuracy-efficiency trade-off specific to HRW datasets. 3. Our findings suggest that for sparse detection tasks, the architectural capacity for dynamic, fine-grained feature extraction is paramount, offering valuable insights for future network design in this domain.

Acknowledgements

This work would not have been possible without the excellent codebases provided by the authors of SparseFormer and LSNet. We sincerely thank them for their open-source contributions to the community.

* The core code has been released. More docs will be updated in the future. Feel free to issue！

Owner

Name: Li Xiang
Login: lx88882222
Kind: user
Location: Tsinghua University
Company: Tsinghua University

Repositories: 1
Profile: https://github.com/lx88882222

A student in Tsinghua university.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total

Watch event: 1
Push event: 2

Last Year

Watch event: 1
Push event: 2

Dependencies

docker/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

docker/serve/Dockerfile docker

pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build

requirements/albu.txt pypi

albumentations >=0.3.2

requirements/build.txt pypi

cython *
numpy *

requirements/docs.txt pypi

docutils ==0.16.0
myst-parser *
sphinx ==4.0.2
sphinx-copybutton *
sphinx_markdown_tables *
sphinx_rtd_theme ==0.5.2

requirements/mminstall.txt pypi

mmcv >=2.0.0rc1,<2.1.0
mmengine >=0.1.0,<1.0.0

requirements/optional.txt pypi

cityscapesscripts *
imagecorruptions *
scikit-learn *

requirements/readthedocs.txt pypi

mmcv >=2.0.0rc1,<2.1.0
mmengine >=0.1.0,<1.0.0
scipy *
torch *
torchvision *

requirements/runtime.txt pypi

matplotlib *
numpy *
pycocotools *
scipy *
six *
terminaltables *

requirements/tests.txt pypi

asynctest * test
cityscapesscripts * test
codecov * test
flake8 * test
imagecorruptions * test
instaboostfast * test
interrogate * test
isort ==4.3.21 test
kwarray * test
memory_profiler * test
onnx ==1.7.0 test
onnxruntime >=1.8.0 test
parameterized * test
protobuf <=3.20.1 test
psutil * test
pytest * test
ubelt * test
xdoctest >=0.10.0 test
yapf * test

requirements.txt pypi

Pillow ==9.4.0
Pillow ==11.2.1
Requests ==2.32.4
albumentations ==2.0.8
cityscapesScripts ==2.2.1
cityscapesScripts ==2.2.4
ffmpegcv ==0.3.18
imagecorruptions ==1.1.2
imageio ==2.37.0
imageio ==2.25.0
instaboostfast ==0.1.2
matplotlib ==3.5.3
memory_profiler ==0.61.0
mmcv ==2.2.0
mmengine ==0.10.7
model_archiver ==1.0.3
numpy ==1.21.6
opencv_python ==4.7.0.68
opencv_python ==4.11.0.86
parameterized ==0.9.0
psutil ==7.0.0
pycocotools ==2.0.8
pycocotools ==2.0.6
pytest ==8.4.0
pytorch_sphinx_theme ==0.0.19
scikit_image ==0.19.3
scikit_learn ==1.7.0
scipy ==1.7.3
seaborn ==0.13.2
setuptools ==67.1.0
setuptools ==75.8.0
six ==1.16.0
six ==1.17.0
terminaltables ==3.1.10
timm ==1.0.15
torch ==2.6.0
torchvision ==0.21.0
tqdm ==4.64.1
tqdm ==4.67.1
triton ==3.2.0
ts ==0.5.1

setup.py pypi