sparsenet-panda
A course project for THU machine learning 2025
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
A course project for THU machine learning 2025
Basic Info
- Host: GitHub
- Owner: lx88882222
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 152 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SparseNet: Exploring the Trade-off between Convolution and Attention in Gigapixel Object Detection
This repository contains the official implementation for the final project of the Machine Learning course (Spring 2025) at Tsinghua University. This project investigates the architectural trade-offs for efficient and accurate object detection on the gigapixel-level, highly sparse PANDA dataset.
This work is built upon two excellent open-source projects: * SparseFormer (liwenxi/SparseFormer) * LSNet (THU-MIG/lsnet)
Our code starts from the SparseFormer baseline and innovatively integrates the highly efficient LS-Block from LSNet, leading to a valuable discovery about designing networks for this unique visual challenge.
🚀 Core Idea & Story
The central theme of this project is not just to achieve high performance, but to understand why certain architectural choices succeed or fail in the extreme environment of gigapixel object detection.
Our story unfolds in three acts:
Act I: Building a Strong ConvNet Baseline (
SparseNet): We first question whether the complex self-attention mechanism in SparseFormer is truly necessary. We replace its core attention blocks with standard convolutional residual blocks, creating our baseline,SparseNet. This model surprisingly achieves a strong 0.70 AP50, proving that a pure ConvNet is a viable contender.Act II: The Quest for Ultimate Efficiency (
SparseNet-LS): Inspired by the "See Large, Focus Small" principle of LSNet, we hypothesize that we can significantly reduce GFLOPs by swapping our local convolution blocks with the hyper-efficient LS-Blocks. This leads to the creation of theSparseNet-LSvariant.Act III: An Insightful Discovery: The experiment yields a fascinating result. While
SparseNet-LSsuccessfully lowers GFLOPs, its AP50 drops to 0.58. This is not a failure, but a key insight: for the sparse and high-variance nature of the PANDA dataset, the dynamic, adaptive modeling capability of self-attention (or a sufficiently complex ConvNet block) is more critical than the sheer computational efficiency of generalized lightweight modules like the LS-Block.
🛠️ Installation
Clone the repository:
bash git clone https://github.com/lx88882222/SparseNet-PANDA.git cd SparseNet-PANDACreate and activate a Conda environment:
bash conda create -n sparsenet python=3.8 -y conda activate sparsenetInstall dependencies: This project is built upon MMDetection. Please install the necessary dependencies using the provided
requirements.txtand by following the official MMDetection installation guide. ```bash pip install -r requirements.txtYou might need to install PyTorch and MMCV manually to match your CUDA version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install -U openmim
mim install mmcv-full
mim install mmdet
```
🔬 Usage
Data Preparation
Please download the PANDA dataset from the official website and structure it as required by MMDetection.
Inference with Our Best Model
You can easily reproduce the results of our best-performing model (SparseNet). The champion model weights and its corresponding configuration file are located in the /checkpoints directory.
To run distributed testing on N GPUs: ```bash
Make sure you are in the project root directory
./tools/disttest.sh \ checkpoints/config.py \ checkpoints/bestmodel.pth \ N ```
Training From Scratch
To train the models yourself, you can use the following commands for multi-GPU training.
Train
SparseNet(Our Baseline): ```bashThis command trains the baseline model that achieved 0.70 AP50
Replace N with the number of GPUs you want to use.
./tools/disttrain.sh \ [PATHTOYOURSPARSENETCONFIG] \ N \ --work-dir ./workdirs/mysparsenetexperiment ```
Train
SparseNet-LS(Our Experiment): ```bashThis command trains the experimental model with LS-Blocks
Replace N with the number of GPUs you want to use.
./tools/disttrain.sh \ [PATHTOYOURSPARSENETLSCONFIG] \ N \ --work-dir ./workdirs/mysparsenetlsexperiment ```
📈 Key Results
Our core findings are summarized in the table below, highlighting the trade-off between accuracy and efficiency.
| Model | Core Local Module | AP50 | GFLOPs (Relative) | Key Takeaway |
| --------------- | -------------------------- |:----:|:-----------------:|:-------------------------------------------|
| SparseNet | Standard Conv Residual Block | 0.70 | High | Proves ConvNets are strong for this task. |
| SparseNet-LS| LS-Block from LSNet | 0.58 | Low | Shows that efficiency alone is not enough. |
🎓 Conclusion & Contribution
This project provides a comprehensive study on vision architectures for gigapixel detection. Our main contributions are:
1. We built and validated a strong pure-convolutional baseline, SparseNet, demonstrating its effectiveness.
2. We conducted a novel experiment by integrating the LS-Block into our baseline, quantitatively revealing a critical accuracy-efficiency trade-off specific to HRW datasets.
3. Our findings suggest that for sparse detection tasks, the architectural capacity for dynamic, fine-grained feature extraction is paramount, offering valuable insights for future network design in this domain.
Acknowledgements
This work would not have been possible without the excellent codebases provided by the authors of SparseFormer and LSNet. We sincerely thank them for their open-source contributions to the community.
* The core code has been released. More docs will be updated in the future. Feel free to issue!
Owner
- Name: Li Xiang
- Login: lx88882222
- Kind: user
- Location: Tsinghua University
- Company: Tsinghua University
- Repositories: 1
- Profile: https://github.com/lx88882222
A student in Tsinghua university.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - name: "MMDetection Contributors" title: "OpenMMLab Detection Toolbox and Benchmark" date-released: 2018-08-22 url: "https://github.com/open-mmlab/mmdetection" license: Apache-2.0
GitHub Events
Total
- Watch event: 1
- Push event: 2
Last Year
- Watch event: 1
- Push event: 2
Dependencies
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
- albumentations >=0.3.2
- cython *
- numpy *
- docutils ==0.16.0
- myst-parser *
- sphinx ==4.0.2
- sphinx-copybutton *
- sphinx_markdown_tables *
- sphinx_rtd_theme ==0.5.2
- mmcv >=2.0.0rc1,<2.1.0
- mmengine >=0.1.0,<1.0.0
- cityscapesscripts *
- imagecorruptions *
- scikit-learn *
- mmcv >=2.0.0rc1,<2.1.0
- mmengine >=0.1.0,<1.0.0
- scipy *
- torch *
- torchvision *
- matplotlib *
- numpy *
- pycocotools *
- scipy *
- six *
- terminaltables *
- asynctest * test
- cityscapesscripts * test
- codecov * test
- flake8 * test
- imagecorruptions * test
- instaboostfast * test
- interrogate * test
- isort ==4.3.21 test
- kwarray * test
- memory_profiler * test
- onnx ==1.7.0 test
- onnxruntime >=1.8.0 test
- parameterized * test
- protobuf <=3.20.1 test
- psutil * test
- pytest * test
- ubelt * test
- xdoctest >=0.10.0 test
- yapf * test
- Pillow ==9.4.0
- Pillow ==11.2.1
- Requests ==2.32.4
- albumentations ==2.0.8
- cityscapesScripts ==2.2.1
- cityscapesScripts ==2.2.4
- ffmpegcv ==0.3.18
- imagecorruptions ==1.1.2
- imageio ==2.37.0
- imageio ==2.25.0
- instaboostfast ==0.1.2
- matplotlib ==3.5.3
- memory_profiler ==0.61.0
- mmcv ==2.2.0
- mmengine ==0.10.7
- model_archiver ==1.0.3
- numpy ==1.21.6
- opencv_python ==4.7.0.68
- opencv_python ==4.11.0.86
- parameterized ==0.9.0
- psutil ==7.0.0
- pycocotools ==2.0.8
- pycocotools ==2.0.6
- pytest ==8.4.0
- pytorch_sphinx_theme ==0.0.19
- scikit_image ==0.19.3
- scikit_learn ==1.7.0
- scipy ==1.7.3
- seaborn ==0.13.2
- setuptools ==67.1.0
- setuptools ==75.8.0
- six ==1.16.0
- six ==1.17.0
- terminaltables ==3.1.10
- timm ==1.0.15
- torch ==2.6.0
- torchvision ==0.21.0
- tqdm ==4.64.1
- tqdm ==4.67.1
- triton ==3.2.0
- ts ==0.5.1