scene_labeling_phd
This codebase contains the implementation of the Signal To Text system that incorporates spatial relationship information for generating scene labels
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Repository
This codebase contains the implementation of the Signal To Text system that incorporates spatial relationship information for generating scene labels
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
scene_labeling
This codebase contains the implementation of the Scene To Text system that incorporates spatial relationship information developed as PhD dissertation research in CITE PHD.
This code is free to use but the authors ask that if you make use of any of the code during research you cite the work using PAPER CITATION.
Installation
Anaconda Environment
The codebase makes use of an Anaconda enviroment. This environment can be installed by running the following command from the conda_env directory in an Anaconda prompt:
conda env create -f environment.yml
MatLab Engine for Python
The Histogram of Forces (HOF) code is used to compute the spatial relationships between object two tuples in an image.
The HOF code is implemented in MatLab and the MatLab engine for Python is required to run the HOF code.
Installation instructions for the MatLab engine for Python can be found at this link.
You will need to follow the instructions for intalling the engine API at the system command prompt and this will need to be done inside of the Anaconda environment.
For example, from an Anaconda prompt, run the following commands:
conda activate scene_labeling
cd <matlabroot>\extern\engines\python
python setup.py install
YOLOv3 Object Detection Model
The code base uses the YOLOv3 object detection model. Due to size constraints on the repository, this model could not be uploaded. The model can be downloaded from the YOLOv3 site.
The files required are: - yolov3.cfg - yolov3.weights
The code base originally stored these files in the input/models/ directory, as can be seen in the object_detection.py script.
Usage
The first script to execute is the import_coco.py script. This script will download the specified number of images
from the COCO 17 data set. These images will be saved in the coco_images/ directory. This repository contains example
output for each of the three stages of the pipeline as ran on 25 sample images. These files are:
- input/object_detection.csv
- input/personobjectdetection.csv
- input/metadata.csv
- output/levelonesummaries.csv
- output/generalleveltwo_summaries.csv
- output/personleveltwo_summaries.csv
After the images are downloaded, the app.py script can be ran and will process each of the input images using the
three stages of the pipeline.
- Object detection
- Performs object localization using the YOLOv3 [1] model
- Performs meta data generation using the Inception model
- The Inception model will be downloaded at runtime to the appropriate directory if the model does not exist
- The object localization and meta data results will be stored in CSV files
- Level One Summaries
- Generates information corresponding to proximity, overlap, and spatial relationships pertaining to each object two-tuple in an image
- Promity and Overlap information is computed using the Generalized Intersection Over Union (GIOU) [5] algorithm
- Spatial Relationship information is computed using the Histogram of Forces (HOF) [2] [3] [4] algorithm
- The level one summaries will be stored to CSV file after computation
- This is a computationally expensive process, and as such there is a boolean flag in
app.pyFINALIZED_LEVEL_ONEis set to false initially for level one summary computation- Level one summaries will be written to disk every 10 computations to avoid recomputing results
FINALIZED_LEVEL_ONEshould be set to true after the level one summaries have been computed
- Level Two Summaries
- In the general case, level two summaries indicate if two objects are or are not interacting
- The general level two summaries are stored in a CSV file after computation
- The person domain level two summaries indicate interactions between a person and an object in the scene
- These level two summaries are stored to disk after computation
- In the general case, level two summaries indicate if two objects are or are not interacting
Visualization
The scripts located in the visualizations/ directory can be used to visualize each of the system outputs generated.
Example System Output
Below are some example outputs generated by the S2T system. Each table contains the localization results on the right with the General domain level two summaries and Person domain level two summaries on the right.
| Object Localization | Level Two Summaries |
| ------------------- | ------------------- |
|
| General: Person1 interacting with bench1
Person: Person1 sitting on bench1 |
|
| General: Person1 interacting with umbrella1
Person: Person1 carrying umbrella1 |
|
| General: Person1 interacting with cellphone1
Person: person1 talking on cellphone1 |
Attribution
Citation information here
References
[1] Redmon, J. and Farhadi, A., "YOLOv3: An Incremental Improvement," arXiv, 2018.
[2] Matsakis, P. and Wendling, L., “New Way to Represent the Relative Position between Areal Objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 7, 1999, pp. 634-643.
[3] Matsakis, P., Keller, J., Wendling, L., Marjamaa, J. and Sjahputera, O., "Linguistic Description of Relative Positions of Objects in Images", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 31, No. 4, 2001, pp. 573-588.
[4] Matsakis, P., Keller, J., Sjahputera, O., and Marjamaa, J. “The Use of Force Histograms for Affine-Invariant Relative Position Description”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, No. 1, 2004, pp.1-18.
[5] Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., and Savarese, S, "Generalized Intersection Over Union: A Metric and A Loss for Bounding Box Regression", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
Owner
- Name: Jeremy Davis
- Login: jedavis82
- Kind: user
- Repositories: 10
- Profile: https://github.com/jedavis82
PhD in computer science with a focus on computer vision and scene understanding.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below. This software is free to use with the disclaimer that the authors are not responsible for any misuse." authors: - family-names: "Davis" given-names: "Jeremy" title: "Scene Labeling: Annotations for Image Datasets" version: 1.0.0 doi: 10.5281/zenodo.1234 date-released: 2021-12-20 url: "https://github.com/jedavis82/scene_labeling"