cca-gnnclust

Cross Camera Data Association using Supervised Clustering GNN

https://github.com/djordjened92/cca-gnnclust

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Cross Camera Data Association using Supervised Clustering GNN

Basic Info
  • Host: GitHub
  • Owner: djordjened92
  • Language: Python
  • Default Branch: main
  • Size: 6.92 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

Cross Camera Data Association using Supervised Clustering GNN

Introduction

This project is an attempt to apply Hi-LANDER (one of methods of Graph Neural Network for Supervised Graph Clustering ) to the cross-camera instance matching. The full report can be found here.
The specific task in this project is connecting persons across the different views in different environments:

Method

Graph Creation

The input structure for this method is a directed graph $G = (V,E)$, where $V = {vi \mid i \in [1, N]}$ represents the set of nodes denoting all pedestrian bounding boxes. Each node is depicted by embedding $hi$ initialized with appropriate, normalized feature $fi$, forming node embeddings set $H = {hi \mid i \in [1, N]}$. For each node of camera $ci$, we find the one closest neighbor from each other camera view $cj, j \neq i$ (green arrows in the cell $(l_1, a)$ in Table 1 between C1 and C2, the same way for each other cameras pair - black arrows), unlike Hi-LANDER which applies pure kNN over the whole corpus of nodes. This neighbor selection per camera is related to the setup where the pedestrian can appear mostly once in each view.

Graph Encoding

Using $hi$ as the input embedding of the node $vi$, GCN encodes it as a new node embedding $h_i'$ in the following way:

$$hi' = \phi(hi, \sum{vj \in N{vi}} w{ji}\psi(hj))$$

where $\phi$ and $\psi$ are MLPs, $w{ji}$ is a trainable vector. $N{vi} = {vj, (vj, vi) \in E}$ is the neighborhood of node $v_i$, defined with the set of incoming edges.

GCN encoder can be applied multiple times on the same graph, so the effect of the number of message passing steps is also explored in this work.

Linkage Prediction and Node Density

After the Graph Encoding step, resulting node features $H'$ are used to predict the linkage between nodes. The edge $(vi, vj)$ connectivity is predicted by applying MLP classifier $\theta$. The input is a vector created from concatenated node features ($hi', hj'$) and nodes' ground plane positions $(\hat{xi}, \hat{yi})$, $(\hat{xj}, \hat{yj})$. The original work considers the concatenation of node features only. The output is a sigmoid activation which estimates the probability that two connected nodes have the same label. math \hat{r}_{ij} = P(r_i = r_j) = \sigma(\theta([h_i', \hat{x_i}, \hat{y_i}, h_j', \hat{x_j}, \hat{y_j}]^T))

A node density $di$ is the value that depicts the weighted partition of neighbors which have the same label as the node $vi$. Its estimation is defined as:

math \hat{d_i} = \frac{1}{k}\sum_{j=1}^{k}\hat{e}_{ij}a_{ij}

where $a{i,j} = \langle hi, hj \rangle$ is the similarity of nodes' embeddings, and $\hat{e}{ij}$ is the edge coefficient defined as:

math \hat{e}_{ij} = P(r_i = r_j) - P(r_i \neq r_j).

Graph Decoding

After an estimation of the graph attributes (node density and edge coefficient) using the GNN encoder, it is possible to find connected components of the graph in the next two steps:

Edge filtering: We initialize a new edge set $E' = \emptyset$. The subset of outgoing edges for each node $v_i$ are created as

math \varepsilon(i) = \{j \mid (v_i, v_j) \in E \wedge \hat{d}_i \leq \hat{d}_j \wedge \hat{r}_{ij} \geq p_{\tau}\}

where

math \hat{r}_{ij}=P(r_i=r_j)

and $p{\tau}$ is the edge connection threshold. Each node with non-empty $\varepsiloni$ contributes to the set $E'$ with one edge selected as

math j=argmax(\hat{e}_{ik}), k \in \varepsilon(i)

The edge $(vi, vj)$ is added to the $E'$. With the condition $\hat{d}i \leq \hat{d}j$ authors of Hi-LANDER introduced an inductive bias to discourage connection to nodes on the border of clusters.

Peak nodes: The set of edges $E'$ defines new, refined graph $G'$ (cell $(l1, b)$ in Table 1) on the same set of nodes. The peak nodes are those without outgoing edges. They have a maximum density in the neighborhood. The way $G'$ is created implies a separation of the graph in the set of connected components $Q = {qi \mid i \in [1, Z]}$. Consequently, each connected component has one peak node distinguished by the highest density in the connected component (cell $(l_1, c)$ in Table 1).

Hierarchical Design

The whole pipeline explained in previous sections can be repeated on the final set of peak nodes as a new input (row $l_2$ in Table 1). Multi-level approach demands an aggregation of the features for each connected component from the level $l$, which is replaced with a single node on the level $l + 1$. The node embeddings of the next level is defined as a concatenation of the peak node features and the mean node features:

math h^{(l + 1)}_i = [\tilde{h}^{(l)}_{q_{i}}, \bar{h}^{(l)}_{q_{i}}].

Lables back-propagation

Once the algorithm finishes and we obtained the final set of peak nodes, their labels can be propagated back to the all nodes in the belonging connected components. For the example in the Table 1 the final parition is given as different node colors on the following figure:

Requirements

In the docker directory run: bash docker build --rm --no-cache -t sgc-cca:v_1 -f Dockerfile .

Citation

If you use this software in your work, please cite it using:
@misc{nedeljković2024crosscameradataassociationgnn, title={Cross-Camera Data Association via GNN for Supervised Graph Clustering}, author={Đorđe Nedeljković}, year={2024}, eprint={2410.00643}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.00643}, }

Owner

  • Name: Djordje Nedeljkovic
  • Login: djordjened92
  • Kind: user
  • Location: Belgrade

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Nedeljković"
  given-names: "Đorđe"
  orcid: "https://orcid.org/0009-0008-3791-6983"
title: "Cross-Camera Data Association via GNN for Supervised Graph Clustering"
url: "https://github.com/djordjened92/cca-gnnclust"

GitHub Events

Total
Last Year

Dependencies

docker/Dockerfile docker
  • nvcr.io/nvidia/cuda 11.7.0-runtime-ubuntu20.04 build