ammico

AI-based Media and Misinformation Content Analysis Tool: Analyze text and images

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
✓
Committers with academic emails
2 of 7 committers (28.6%) from academic institutions
✓
Institutional organization owner
Organization ssciwr has institutional domain (ssc.uni-heidelberg.de)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary

Keywords

classification computer-vision nlp text-extraction translation

Keywords from Contributors

mesh energy-system-model hydrology medical-imaging regionalization energy-system exoplanet standardization interactive optim

Last synced: 6 months ago · JSON representation ·

Repository

AI-based Media and Misinformation Content Analysis Tool: Analyze text and images

Basic Info

Host: GitHub
Owner: ssciwr
License: mit
Language: Python
Default Branch: main
Homepage: https://ssciwr.github.io/AMMICO/
Size: 93.9 MB

Statistics

Stars: 10
Watchers: 1
Forks: 3
Open Issues: 29
Releases: 7

Topics

classification computer-vision nlp text-extraction translation

Created over 3 years ago · Last pushed 8 months ago

Metadata Files

Readme Contributing License Citation

AMMICO - AI-based Media and Misinformation Content Analysis Tool

License: MIT GitHub Workflow Status codecov Quality Gate Status Language

This package extracts data from images such as social media posts that contain an image part and a text part. The analysis can generate a very large number of features, depending on the user input. See our paper for a more in-depth description.

This project is currently under development!

Use pre-processed image files such as social media posts with comments and process to collect information: 1. Text extraction from the images 1. Language detection 1. Translation into English or other languages 1. Cleaning of the text, spell-check 1. Sentiment analysis 1. Named entity recognition 1. Topic analysis 1. Content extraction from the images 1. Textual summary of the image content ("image caption") that can be analyzed further using the above tools 1. Feature extraction from the images: User inputs query and images are matched to that query (both text and image query) 1. Question answering
1. Performing person and face recognition in images 1. Face mask detection 1. Probabilistic detection of age, gender and race 1. Emotion recognition 1. Color analysis 1. Analyse hue and percentage of color on image 1. Multimodal analysis 1. Find best matches for image content or image similarity 1. Cropping images to remove comments from posts

Installation

The AMMICO package can be installed using pip: pip install ammico This will install the package and its dependencies locally. If after installation you get some errors when running some modules, please follow the instructions in the FAQ.

Usage

The main demonstration notebook can be found in the notebooks folder and also on google colab:

There are further sample notebooks in the notebooks folder for the more experimental features: 1. Topic analysis: Use the notebook get-text-from-image.ipynb to analyse the topics of the extraced text.\ You can run this notebook on google colab:
Place the data files and google cloud vision API key in your google drive to access the data. 1. To crop social media posts use the cropposts.ipynb notebook. You can run this notebook on google colab:

Features

Text extraction

The text is extracted from the images using google-cloud-vision. For this, you need an API key. Set up your google account following the instructions on the google Vision AI website or as described here. You then need to export the location of the API key as an environment variable: export GOOGLE_APPLICATION_CREDENTIALS="location of your .json" The extracted text is then stored under the text key (column when exporting a csv).

Googletrans is used to recognize the language automatically and translate into English. The text language and translated text is then stored under the text_language and text_english key (column when exporting a csv).

If you further want to analyse the text, you have to set the analyse_text keyword to True. In doing so, the text is then processed using spacy (tokenized, part-of-speech, lemma, ...). The English text is cleaned from numbers and unrecognized words (text_clean), spelling of the English text is corrected (text_english_correct), and further sentiment and subjectivity analysis are carried out (polarity, subjectivity). The latter two steps are carried out using TextBlob. For more information on the sentiment analysis using TextBlob see here.

The Hugging Face transformers library is used to perform another sentiment analysis, a text summary, and named entity recognition, using the transformers pipeline.

Content extraction

The image content ("caption") is extracted using the LAVIS library. This library enables vision intelligence extraction using several state-of-the-art models such as BLIP and BLIP2, depending on the task and user selection. Further, it allows feature extraction from the images, where users can input textual and image queries, and the images in the database are matched to that query (multimodal search). Another option is question answering, where the user inputs a text question and the library finds the images that match the query.

Emotion recognition

Emotion recognition is carried out using the deepface and retinaface libraries. These libraries detect the presence of faces, as well as provide probabilistic assessment of their age, gender, race, and emotion based on several state-of-the-art models. It is also detected if the person is wearing a face mask - if they are, then no further detection is carried out as the mask affects the assessment acuracy. Because the detection of gender, race and age is carried out in simplistic categories (e.g., for gender, using only "male" and "female"), and because of the ethical implications of such assessments, users can only access this part of the tool if they agree with an ethical disclosure statement (see FAQ). Moreover, once users accept the disclosure, they can further set their own detection confidence threshholds.

Color/hue detection

Color detection is carried out using colorgram.py and colour for the distance metric. The colors can be classified into the main named colors/hues in the English language, that are red, green, blue, yellow, cyan, orange, purple, pink, brown, grey, white, black.

Cropping of posts

Social media posts can automatically be cropped to remove further comments on the page and restrict the textual content to the first comment only.

Owner

Name: SSC
Login: ssciwr
Kind: organization
Email: ssc@iwr.uni-heidelberg.de
Location: Heidelberg University, Germany

Website: https://ssc.uni-heidelberg.de/
Repositories: 78
Profile: https://github.com/ssciwr

Scientific Software Center, IWR, Heidelberg University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Dumitrescu"
  given-names: "Delia"
  orcid: "https://orcid.org/0000-0002-0065-3875"
- family-names: "Ulusoy"
  given-names: "Inga"
  orcid: "https://orcid.org/0000-0001-7294-4148"
- family-names: "Andriushchenko"
  given-names: "Petr"
  orcid: "https://orcid.org/0000-0002-4518-6588"
- family-names: "Daskalakis"
  given-names: "Gwydion"
  orcid: "https://orcid.org/0000-0002-7557-1364"
- family-names: "Kempf"
  given-names: "Dominic"
  orcid: "https://orcid.org/0000-0002-6140-2332"
- family-names: "Ma"
  given-names: "Xianghe"
title: "AMMICO, an AI Media and Misinformation Content Analysis Tool"
version: 0.2.0
doi: 10.31235/osf.io/v8txj
date-released: 2023-9-4
url: "https://github.com/ssciwr/AMMICO"

GitHub Events

Total

Create event: 26
Release event: 3
Issues event: 14
Watch event: 2
Delete event: 18
Issue comment event: 52
Push event: 53
Pull request review event: 4
Pull request review comment event: 4
Pull request event: 44
Fork event: 2

Last Year

Create event: 26
Release event: 3
Issues event: 14
Watch event: 2
Delete event: 18
Issue comment event: 52
Push event: 53
Pull request review event: 4
Pull request review comment event: 4
Pull request event: 44
Fork event: 2

Committers

Last synced: 7 months ago

All Time

Total Commits: 272
Total Committers: 7
Avg Commits per committer: 38.857
Development Distribution Score (DDS): 0.607

Past Year

Commits: 23
Committers: 3
Avg Commits per committer: 7.667
Development Distribution Score (DDS): 0.348

Top Committers

Name	Email	Commits
Petr Andriushchenko	p**d@g**m	107
Inga Ulusoy	i**y@u**e	91
pre-commit-ci[bot]	6****]	41
Dominic Kempf	d**f@i**e	15
GwydionJon	3****n	10
dependabot[bot]	4****]	6
xiaohemaikoo	3****o	2

Committer Domains (Top 20 + Academic)

iwr.uni-heidelberg.de: 1 uni-heidelberg.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 41
Total pull requests: 103
Average time to close issues: 4 months
Average time to close pull requests: 10 days
Total issue authors: 5
Total pull request authors: 8
Average comments per issue: 0.12
Average comments per pull request: 1.56
Merged pull requests: 82
Bot issues: 0
Bot pull requests: 30

Past Year

Issues: 9
Pull requests: 35
Average time to close issues: 10 days
Average time to close pull requests: 8 days
Issue authors: 3
Pull request authors: 4
Average comments per issue: 0.0
Average comments per pull request: 1.09
Merged pull requests: 18
Bot issues: 0
Bot pull requests: 18

View more stats

Top Authors

Issue Authors

iulusoy (34)
piterand (6)
GwydionJon (5)
dumitrescu5 (4)
TMonty-123 (1)
dependabot[bot] (1)

Pull Request Authors

iulusoy (64)
pre-commit-ci[bot] (28)
dependabot[bot] (26)
piterand (11)
dokempf (8)
GwydionJon (5)
xiaohemaikoo (4)
ChristineSchulz (2)
lkeegan (1)

Top Labels

Issue Labels

enhancement (17) bug (9) question (2) dependencies (2) good first issue (1) hacktoberfest (1)

Pull Request Labels

dependencies (26) bug (1)

Packages

Total packages: 1
Total downloads:
- pypi 88 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 8
Total maintainers: 1

pypi.org: ammico

AI Media and Misinformation Content Analysis Tool

Documentation: https://ammico.readthedocs.io/
License: MIT
Latest release: 0.2.6
published about 1 year ago

Versions: 8
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 88 Last month

Rankings

Dependent packages count: 7.5%

Average: 48.7%

Downloads: 68.9%

Dependent repos count: 69.6%

Maintainers (1)

lkeegan

Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite
codecov/codecov-action v3 composite

.github/workflows/docs.yml actions

JamesIves/github-pages-deploy-action v4 composite
actions/checkout v3 composite
actions/setup-python v4 composite

Dockerfile docker

jupyter/base-notebook latest build

pyproject.toml pypi

bertopic *
cvlib *
deepface <= 0.0.75
google-cloud-vision *
googletrans ==3.1.0a0
grpcio *
ipywidgets *
jupyterlab *
keras *
matplotlib *
numpy <=1.23.4
opencv-contrib-python *
opencv_python *
openpyxl *
pandas *
pooch *
protobuf *
pytest *
pytest-cov *
retina_face *
setuptools *
spacy *
spacytextblob *
tensorflow *
textblob *

requirements-dev.txt pypi

myst-parser * development
sphinx * development
sphinx_rtd_theme * development
sphinxcontrib-napoleon * development

setup.py pypi

.github/workflows/release.yml actions

actions/checkout v4 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
pypa/gh-action-pypi-publish release/v1 composite
sigstore/gh-action-sigstore-python v1.2.3 composite

ammico

Science Score: 75.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

AMMICO - AI-based Media and Misinformation Content Analysis Tool

Installation

Usage

Features

Text extraction

Content extraction

Emotion recognition

Color/hue detection

Cropping of posts

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: ammico

Rankings

Maintainers (1)

Dependencies