scene_labeling_phd

This codebase contains the implementation of the Signal To Text system that incorporates spatial relationship information for generating scene labels

https://github.com/jedavis82/scene_labeling_phd

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

This codebase contains the implementation of the Signal To Text system that incorporates spatial relationship information for generating scene labels

Basic Info

Host: GitHub
Owner: jedavis82
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 6.46 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

scene_labeling

This codebase contains the implementation of the Scene To Text system that incorporates spatial relationship information developed as PhD dissertation research in CITE PHD.

This code is free to use but the authors ask that if you make use of any of the code during research you cite the work using PAPER CITATION.

Installation

Anaconda Environment

The codebase makes use of an Anaconda enviroment. This environment can be installed by running the following command from the conda_env directory in an Anaconda prompt:

conda env create -f environment.yml

MatLab Engine for Python

The Histogram of Forces (HOF) code is used to compute the spatial relationships between object two tuples in an image.

The HOF code is implemented in MatLab and the MatLab engine for Python is required to run the HOF code.

Installation instructions for the MatLab engine for Python can be found at this link.

You will need to follow the instructions for intalling the engine API at the system command prompt and this will need to be done inside of the Anaconda environment.

For example, from an Anaconda prompt, run the following commands:

conda activate scene_labeling

cd <matlabroot>\extern\engines\python

python setup.py install

YOLOv3 Object Detection Model

The code base uses the YOLOv3 object detection model. Due to size constraints on the repository, this model could not be uploaded. The model can be downloaded from the YOLOv3 site.

The files required are: - yolov3.cfg - yolov3.weights

The code base originally stored these files in the input/models/ directory, as can be seen in the object_detection.py script.

Usage

The first script to execute is the import_coco.py script. This script will download the specified number of images from the COCO 17 data set. These images will be saved in the coco_images/ directory. This repository contains example output for each of the three stages of the pipeline as ran on 25 sample images. These files are:

input/object_detection.csv
input/personobjectdetection.csv
input/metadata.csv
output/levelonesummaries.csv
output/generalleveltwo_summaries.csv
output/personleveltwo_summaries.csv

After the images are downloaded, the app.py script can be ran and will process each of the input images using the three stages of the pipeline.

Object detection
1. Performs object localization using the YOLOv3 [1] model
2. Performs meta data generation using the Inception model
  1. The Inception model will be downloaded at runtime to the appropriate directory if the model does not exist
3. The object localization and meta data results will be stored in CSV files
Level One Summaries
1. Generates information corresponding to proximity, overlap, and spatial relationships pertaining to each object two-tuple in an image
2. Promity and Overlap information is computed using the Generalized Intersection Over Union (GIOU) [5] algorithm
3. Spatial Relationship information is computed using the Histogram of Forces (HOF) [2] [3] [4] algorithm
4. The level one summaries will be stored to CSV file after computation
5. This is a computationally expensive process, and as such there is a boolean flag in app.py
  1. FINALIZED_LEVEL_ONE is set to false initially for level one summary computation
    1. Level one summaries will be written to disk every 10 computations to avoid recomputing results
    2. FINALIZED_LEVEL_ONE should be set to true after the level one summaries have been computed
Level Two Summaries
1. In the general case, level two summaries indicate if two objects are or are not interacting
  1. The general level two summaries are stored in a CSV file after computation
2. The person domain level two summaries indicate interactions between a person and an object in the scene
  1. These level two summaries are stored to disk after computation

Visualization

The scripts located in the visualizations/ directory can be used to visualize each of the system outputs generated.

Example System Output

Below are some example outputs generated by the S2T system. Each table contains the localization results on the right with the General domain level two summaries and Person domain level two summaries on the right.

| Object Localization | Level Two Summaries | | ------------------- | ------------------- | | | General: Person1 interacting with bench1
Person: Person1 sitting on bench1 | | | General: Person1 interacting with umbrella1
Person: Person1 carrying umbrella1 | | | General: Person1 interacting with cellphone1
Person: person1 talking on cellphone1 |

Attribution

Citation information here

References

[1] Redmon, J. and Farhadi, A., "YOLOv3: An Incremental Improvement," arXiv, 2018.

[2] Matsakis, P. and Wendling, L., “New Way to Represent the Relative Position between Areal Objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 7, 1999, pp. 634-643.

[3] Matsakis, P., Keller, J., Wendling, L., Marjamaa, J. and Sjahputera, O., "Linguistic Description of Relative Positions of Objects in Images", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 31, No. 4, 2001, pp. 573-588.

[4] Matsakis, P., Keller, J., Sjahputera, O., and Marjamaa, J. “The Use of Force Histograms for Affine-Invariant Relative Position Description”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 1, 2004, pp.1-18.

[5] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S, "Generalized Intersection Over Union: A Metric and A Loss for Bounding Box Regression", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.

Owner

Name: Jeremy Davis
Login: jedavis82
Kind: user

Repositories: 10
Profile: https://github.com/jedavis82

PhD in computer science with a focus on computer vision and scene understanding.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below. This software is free to use with the disclaimer that the authors are not responsible for any misuse."
authors:
- family-names: "Davis"
  given-names: "Jeremy"
title: "Scene Labeling: Annotations for Image Datasets"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2021-12-20
url: "https://github.com/jedavis82/scene_labeling"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science