sparsenet-panda

A course project for THU machine learning 2025

https://github.com/lx88882222/sparsenet-panda

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A course project for THU machine learning 2025

Basic Info
  • Host: GitHub
  • Owner: lx88882222
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 152 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

SparseNet: Exploring the Trade-off between Convolution and Attention in Gigapixel Object Detection

This repository contains the official implementation for the final project of the Machine Learning course (Spring 2025) at Tsinghua University. This project investigates the architectural trade-offs for efficient and accurate object detection on the gigapixel-level, highly sparse PANDA dataset.

This work is built upon two excellent open-source projects: * SparseFormer (liwenxi/SparseFormer) * LSNet (THU-MIG/lsnet)

Our code starts from the SparseFormer baseline and innovatively integrates the highly efficient LS-Block from LSNet, leading to a valuable discovery about designing networks for this unique visual challenge.


🚀 Core Idea & Story

The central theme of this project is not just to achieve high performance, but to understand why certain architectural choices succeed or fail in the extreme environment of gigapixel object detection.

Our story unfolds in three acts:

  1. Act I: Building a Strong ConvNet Baseline (SparseNet): We first question whether the complex self-attention mechanism in SparseFormer is truly necessary. We replace its core attention blocks with standard convolutional residual blocks, creating our baseline, SparseNet. This model surprisingly achieves a strong 0.70 AP50, proving that a pure ConvNet is a viable contender.

  2. Act II: The Quest for Ultimate Efficiency (SparseNet-LS): Inspired by the "See Large, Focus Small" principle of LSNet, we hypothesize that we can significantly reduce GFLOPs by swapping our local convolution blocks with the hyper-efficient LS-Blocks. This leads to the creation of the SparseNet-LS variant.

  3. Act III: An Insightful Discovery: The experiment yields a fascinating result. While SparseNet-LS successfully lowers GFLOPs, its AP50 drops to 0.58. This is not a failure, but a key insight: for the sparse and high-variance nature of the PANDA dataset, the dynamic, adaptive modeling capability of self-attention (or a sufficiently complex ConvNet block) is more critical than the sheer computational efficiency of generalized lightweight modules like the LS-Block.


🛠️ Installation

  1. Clone the repository: bash git clone https://github.com/lx88882222/SparseNet-PANDA.git cd SparseNet-PANDA

  2. Create and activate a Conda environment: bash conda create -n sparsenet python=3.8 -y conda activate sparsenet

  3. Install dependencies: This project is built upon MMDetection. Please install the necessary dependencies using the provided requirements.txt and by following the official MMDetection installation guide. ```bash pip install -r requirements.txt

    You might need to install PyTorch and MMCV manually to match your CUDA version

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

    pip install -U openmim

    mim install mmcv-full

    mim install mmdet

    ```


🔬 Usage

Data Preparation

Please download the PANDA dataset from the official website and structure it as required by MMDetection.

Inference with Our Best Model

You can easily reproduce the results of our best-performing model (SparseNet). The champion model weights and its corresponding configuration file are located in the /checkpoints directory.

To run distributed testing on N GPUs: ```bash

Make sure you are in the project root directory

./tools/disttest.sh \ checkpoints/config.py \ checkpoints/bestmodel.pth \ N ```

Training From Scratch

To train the models yourself, you can use the following commands for multi-GPU training.

  • Train SparseNet (Our Baseline): ```bash

    This command trains the baseline model that achieved 0.70 AP50

    Replace N with the number of GPUs you want to use.

    ./tools/disttrain.sh \ [PATHTOYOURSPARSENETCONFIG] \ N \ --work-dir ./workdirs/mysparsenetexperiment ```

  • Train SparseNet-LS (Our Experiment): ```bash

    This command trains the experimental model with LS-Blocks

    Replace N with the number of GPUs you want to use.

    ./tools/disttrain.sh \ [PATHTOYOURSPARSENETLSCONFIG] \ N \ --work-dir ./workdirs/mysparsenetlsexperiment ```


📈 Key Results

Our core findings are summarized in the table below, highlighting the trade-off between accuracy and efficiency.

| Model | Core Local Module | AP50 | GFLOPs (Relative) | Key Takeaway | | --------------- | -------------------------- |:----:|:-----------------:|:-------------------------------------------| | SparseNet | Standard Conv Residual Block | 0.70 | High | Proves ConvNets are strong for this task. | | SparseNet-LS| LS-Block from LSNet | 0.58 | Low | Shows that efficiency alone is not enough. |


🎓 Conclusion & Contribution

This project provides a comprehensive study on vision architectures for gigapixel detection. Our main contributions are: 1. We built and validated a strong pure-convolutional baseline, SparseNet, demonstrating its effectiveness. 2. We conducted a novel experiment by integrating the LS-Block into our baseline, quantitatively revealing a critical accuracy-efficiency trade-off specific to HRW datasets. 3. Our findings suggest that for sparse detection tasks, the architectural capacity for dynamic, fine-grained feature extraction is paramount, offering valuable insights for future network design in this domain.

Acknowledgements

This work would not have been possible without the excellent codebases provided by the authors of SparseFormer and LSNet. We sincerely thank them for their open-source contributions to the community.

* The core code has been released. More docs will be updated in the future. Feel free to issue!

Owner

  • Name: Li Xiang
  • Login: lx88882222
  • Kind: user
  • Location: Tsinghua University
  • Company: Tsinghua University

A student in Tsinghua university.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - name: "MMDetection Contributors"
title: "OpenMMLab Detection Toolbox and Benchmark"
date-released: 2018-08-22
url: "https://github.com/open-mmlab/mmdetection"
license: Apache-2.0

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
Last Year
  • Watch event: 1
  • Push event: 2

Dependencies

docker/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
docker/serve/Dockerfile docker
  • pytorch/pytorch ${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel build
requirements/albu.txt pypi
  • albumentations >=0.3.2
requirements/build.txt pypi
  • cython *
  • numpy *
requirements/docs.txt pypi
  • docutils ==0.16.0
  • myst-parser *
  • sphinx ==4.0.2
  • sphinx-copybutton *
  • sphinx_markdown_tables *
  • sphinx_rtd_theme ==0.5.2
requirements/mminstall.txt pypi
  • mmcv >=2.0.0rc1,<2.1.0
  • mmengine >=0.1.0,<1.0.0
requirements/optional.txt pypi
  • cityscapesscripts *
  • imagecorruptions *
  • scikit-learn *
requirements/readthedocs.txt pypi
  • mmcv >=2.0.0rc1,<2.1.0
  • mmengine >=0.1.0,<1.0.0
  • scipy *
  • torch *
  • torchvision *
requirements/runtime.txt pypi
  • matplotlib *
  • numpy *
  • pycocotools *
  • scipy *
  • six *
  • terminaltables *
requirements/tests.txt pypi
  • asynctest * test
  • cityscapesscripts * test
  • codecov * test
  • flake8 * test
  • imagecorruptions * test
  • instaboostfast * test
  • interrogate * test
  • isort ==4.3.21 test
  • kwarray * test
  • memory_profiler * test
  • onnx ==1.7.0 test
  • onnxruntime >=1.8.0 test
  • parameterized * test
  • protobuf <=3.20.1 test
  • psutil * test
  • pytest * test
  • ubelt * test
  • xdoctest >=0.10.0 test
  • yapf * test
requirements.txt pypi
  • Pillow ==9.4.0
  • Pillow ==11.2.1
  • Requests ==2.32.4
  • albumentations ==2.0.8
  • cityscapesScripts ==2.2.1
  • cityscapesScripts ==2.2.4
  • ffmpegcv ==0.3.18
  • imagecorruptions ==1.1.2
  • imageio ==2.37.0
  • imageio ==2.25.0
  • instaboostfast ==0.1.2
  • matplotlib ==3.5.3
  • memory_profiler ==0.61.0
  • mmcv ==2.2.0
  • mmengine ==0.10.7
  • model_archiver ==1.0.3
  • numpy ==1.21.6
  • opencv_python ==4.7.0.68
  • opencv_python ==4.11.0.86
  • parameterized ==0.9.0
  • psutil ==7.0.0
  • pycocotools ==2.0.8
  • pycocotools ==2.0.6
  • pytest ==8.4.0
  • pytorch_sphinx_theme ==0.0.19
  • scikit_image ==0.19.3
  • scikit_learn ==1.7.0
  • scipy ==1.7.3
  • seaborn ==0.13.2
  • setuptools ==67.1.0
  • setuptools ==75.8.0
  • six ==1.16.0
  • six ==1.17.0
  • terminaltables ==3.1.10
  • timm ==1.0.15
  • torch ==2.6.0
  • torchvision ==0.21.0
  • tqdm ==4.64.1
  • tqdm ==4.67.1
  • triton ==3.2.0
  • ts ==0.5.1
setup.py pypi