content-moderation-deep-learning

Deep learning based content moderation from text, audio, video & image input modalities.

https://github.com/fcakyon/content-moderation-deep-learning

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, researchgate.net, scholar.google, springer.com, ieee.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.0%) to scientific vocabulary

Keywords

content-moderation content-ratings genre-classification movie-content-filter movie-trailer multimodal-deep-learning nsfw-recognition nudity-detection profanity-detection violence-detection
Last synced: 7 months ago · JSON representation

Repository

Deep learning based content moderation from text, audio, video & image input modalities.

Basic Info
Statistics
  • Stars: 358
  • Watchers: 4
  • Forks: 21
  • Open Issues: 0
  • Releases: 0
Topics
content-moderation content-ratings genre-classification movie-content-filter movie-trailer multimodal-deep-learning nsfw-recognition nudity-detection profanity-detection violence-detection
Created over 3 years ago · Last pushed 9 months ago
Metadata Files
Readme Funding License Citation

README.md

deep-learning-content-moderation

Various sources for deep learning based content moderation, sensitive content detection, scene genre classification, nudity detection, violence detection, substance detection from text, audio, video & image input modalities.

citation

If you find this source useful, please consider citing it in your work as:

bib @INPROCEEDINGS{10193621, author={Akyon, Fatih Cagatay and Temizel, Alptekin}, booktitle={2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)}, title={State-of-the-Art in Nudity Classification: A Comparative Analysis}, year={2023}, pages={1-5}, keywords={Analytical models;Convolution;Conferences;Transfer learning;Benchmark testing;Transformers;Safety;content moderation;nudity detection;safety;transformers}, doi={10.1109/ICASSPW59220.2023.10193621}} bib @article{akyon2022contentmoderation, title={Deep Architectures for Content Moderation and Movie Content Rating}, author={Akyon, Fatih Cagatay and Temizel, Alptekin}, journal={arXiv}, doi={https://doi.org/10.48550/arXiv.2212.04533}, year={2022} }

table of contents

datasets

movie and content moderation datasets

| name | paper | year | url | input modality | task | labels | |--- |--- |--- |--- |--- |--- |--- | | LSPD | pdf | 2022 | page | image, video | image/video classification, instance segmentation | porn, normal, sexy, hentai, drawings, female/male genital, female breast, anus | | MM-Trailer | pdf | 2021 | page | video | video classification | age rating | | Movienet | scholar | 2021 | page | image, video, text | object detection, video classification | scene level actions and places, character bboxes | | Movie script severity dataset | pdf | 2021 | github | text | text classification | frightening, mild, moderate, severe | | LVU | pdf | 2021 | page | video | video classification | relationship, place, like ration, view count, genre, writer, year per movie scene | | Violence detection dataset | scholar | 2020 | github | video | video classification | violent, not-violent | | Movie script dataset | pdf | 2019 | github | text | text classification | violent or not | | Nudenet | github | 2019 | archive.org | image | image classification | nude or not | | Adult content dataset | pdf | 2017 | contact | image | image classification | nude or not | | Substance use dataset | pdf | 2017 | first author | image | image classification | drug related or not | | NDPI2k dataset | pdf | 2016 | contact | video | video classification | porn or not | | Violent Scenes Dataset | springer | 2014 | page | video | video classification | blood, fire, gun, gore, fight | | VSD2014 | pdf | 2014 | download | video | video classification | blood, fire, gun, gore, fight | | AIIA-PID4 | pdf | 2013 | - | image | image classification | bikini, porn, skin, non-skin | | NDPI800 dataset | scholar | 2013 | page | video | video classification | porn or not | | HMDB-51 | scholar | 2011 | page | video |video classification | smoke, drink |

techniques

sensitive content detection

movie content rating

| name | paper | year | model | features | datasets | tasks | context | |--- |--- |--- |--- |--- |--- |--- |--- | | Movies2Scenes: Learning Scene Representations Using Movie Similarities | scholar | 2022 | ViT-like video encoder + MLP | ViT-like video encoder embedings | Private, Movienet, LVU | movie scene representation learning, video classifcation (sex, violence, drug-use) | movie scene content rating | | Detection and Classification of Sensitive Audio-Visual Content for Automated Film Censorship and Rating | pdf | 2022 | CNN + GRU + MLP | CNN embeddings from video frames | Violence detection dataset | violent/non-violent classification from videos | movie scene content rating | | Automatic parental guide ratings for short movies | page | 2021 | separate model for each task: concat + LSTM, object detector, one-class CNN embeddings | video frame pixel values, image embeddings, text | Nudenet, private dataset | profanity, violence, nudity, drug classification | movie content rating | | From None to Severe: Predicting Severity in Movie Scripts | scholar | 2021 | multi-task pairwise ranking-classification network | GloVe, Bert and TextCNN text embeddings | Movie script severity dataset | rating classifcation (frightening, mild, moderate, severe) | movie content rating | | A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers | scholar | 2021 | multi-modal + multi output concat+MLP | CNN+LSTM video features, Bert and DeepMoji text embeddings, MFCC audio features | MM-Trailer | rating classifcation (red, yellow, green) | movie trailer content rating | | Automatic Parental Guide Scene Classification Menggunakan Metode Deep Convolutional Neural Network Dan Lstm | scholar | 2020 | 3 CNN model for 3 modality, multi-label dataset | CNN video and audio embeddings, LSTM text (subitle) embeddings | private dataset | gore, nudity, drug, profanity classification from video and subtitle | movie scene content rating | | Multimodal data fusion for sensitive scene localization | scholar | 2019 | meta-learning with Naive Bayes, SVM | MFCC and prosodic features from audio, HOG and TRoF features from images | Pornography-2k dataset, VSD2014 | violent and pornographic scene localization from video | movie scene content rating | | A Deep Learning approach for the Motion Picture Content Rating | scholar | 2019 | MLP + rule-based decision | InceptionV3 image embeddings | Violent Scenes Dataset, private dataset | violence (shooting, blood, fire, weapon) classification from video | movie scene content rating | | Hybrid System for MPAA Ratings of Movie Clips Using Support Vector Machine | springer | 2019 | SVM | DCT features from image | private dataset | movie content rating classification from images | movie content rating | | Inappropriate scene detection in a video stream | page | 2017 | SVM classifier + Lenet image classifier + rules-based decision | HoG and CNN features for image | private dataset | image classification: no/mild/high violence, safe/unsafe/pornoghraphy | movie frame content rating |

content moderation

| name | paper | year | model | features | datasets | tasks | context | |--- |--- |--- |--- |--- |--- |--- |--- | | State-of-the-Art in Nudity Classification: A Comparative Analysis | ieee | 2023 | CNN, Transformers | EfficientNet, ViT, ConvNeXT image embeddings | LSPD, Nudenet, NDPI2k | nudity classification from images | general content moderation | | Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild | scholar | 2022 | novel threshold optimization tech. (TruSThresh) | prediction scores | UnSmile (Korean hatespeech dataset) | optimum threshold prediction | social media content moderation | | On-Device Content Moderation | scholar | 2021 | mobilenet v3 + SSD object detector | mobilenet v3 image embeddings | private dataset | object detection + nudity classification from images | on-device content moderation | | Gore Classification and Censoring in Images | scholar | 2021 | ensemble of CNN + MLP | mobilenet v2, densenent, vgg16 image embeddings | private dataset | gore classification from images | general content moderation | | Automated Censoring of Cigarettes in Videos Using Deep Learning Techniques | scholar | 2020 | CNN + MLP | inception v3 image embeddings | private dataset | cigarette classification from video | general content moderation | | A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes | scholar | 2019 | CNN + SVM | InceptionV3 image embeddings, AudioVGG audio embeddings | private dataset | inappropriate (nudity+gore) classification from video | general video content moderation | | A baseline for NSFW video detection in e-learning environments | scholar | 2019 | concat + SVM, MLP | InceptionV3 image embeddings, AudioVGG audio embeddings | YouTube8M, NDPI, Cholec80 | nudity classification from video | e-learning content moderation | | Bringing the kid back into youtube kids: Detecting inappropriate content on video streaming platforms | scholar | 2019 | CNN + LSTM (late fusion) | CNN based encoder for image, video and audio spectrograms | private dataset | video classification: orignal, fake explicit, fake violent | social media content moderation |

movie/scene genre classification

| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | Effectively leveraging Multi-modal Features for Movie Genre Classification | scholar | 2022 | embeddings + fusion + MLP | CLIP image embeddings, PANNs audio embeddings, CLIP text embeddings | MovieNet | movie genre classification | | OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification | scholar | 2022 | embeddings + novel transformer | ResNet-18 image embeddings, ResNet-VLAD audio embeddings | TI-News | news scene segmentation/classification (studio, outdoor, interview) | | Detection of Animated Scenes Among Movie Trailers | scholar | 2022 | CNN + GRU | EfficientNet image embeddings | Private dataset | genre classification from movie trailer scenes | | A multi-label movie genre classification scheme based on the movie's subtitles | springer | 2022 | KNN | text frequency vectors | Private dataset | genre classification from movie subtitle text | | A multimodal approach for multi-label movie genre classification | scholar | 2020 | CNN + LSTM | MFCCs/SSD/LBP from audio, LBP/3DCNN from video frames, Inception-v3 from poster, TFIDF from text | Private dataset | genre classification from movie trailers | | Genre classification of movie trailers using 3d convolutional neural networks | ieee | 2020 | 3D CNN | images | Private dataset | genre classification from movie trailer scenes | | A unified framework of deep networks for genre classification using movie trailer | scholar | 2020 | CNN + LSTM | Inception V4 image embeddings | EmoGDB | genre classification from movie trailer scenes | | Towards story-based classification of movie scenes | scholar | 2020 | logistic regression | manually extracted categorical features | Flintstones Scene Dataset | scene classification (Obstacle, Midpoint, Climax of Act 1) |

multimodal architectures

synchronous multimodal architectures

| name | paper | year | model | features | datasets | tasks | modalities | |--- |--- |--- |--- |--- |--- |--- |--- | | M&M Mix: A Multimodal Multiview Transformer Ensemble | scholar | 2022 | transformer with 2 cls heads | ViT image embeddings from audio spect., frame image, optical flow | Epic-Kitchens | video/action classification | image + audio + optical flow | | MultiMAE: Multi-modal Multi-task Masked Autoencoders | scholar | 2022 | transformer with 3 decoder + cls heads | ViT-like image enc. patch embeddings (optional modalities) | ImageNet: Pseudo labeled multi-task training dataset (depth, segm) | image cs., semantic segm., depth est. | image + depth map | | Data2vec: A general framework for self-supervised learning in speech, vision and language | scholar | 2022 | single encoder | transformer based audio, text, image encoder embeddings | ImageNet, Librispeech | masked pretraining | image + audio + text | | VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text | scholar | 2022 | 1 encoder per modality | transformer based audio, text, image encoder embeddings | AudioSet, HowTo100M | pretraining + video/audio classification | image + audio + text | | Expanding Language-Image Pretrained Models for General Video Recognition | scholar | 2022 | 1 encoder per modality | transformer based video, text encoder embeddings | HMDB-51, UCF-101 | contrastive pretraining | video + text | | Audio-Visual Instance Discrimination with Cross-Modal Agreement | scholar | 2021 | 1 encoder per modality | CNN based audio, video encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + audio | | Robust Audio-Visual Instance Discrimination | scholar | 2021 | 1 encoder per modality | CNN based audio, video encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + audio | | Learning transferable visual models from natural language supervision | scholar | 2021 | 1 encoder per modality | transformer based image, text encoder embeddings | JFT-300M | contrastive pretraining | image + text | | Self-supervised multimodal versatile networks | scholar | 2020 | multiple encoders | CNN based image/audio embeddings, word2vec text embeddings | UCF101, Kinetics, AudioSet | contrastive pretraining + classification | image + audio + text | | Uniter: Universal image-text representation learning | scholar | 2020 | multimodal encoder | combined embeddings | COCO, Visual Genome, Conceptual Captions | qa/image-text retrieval | image + text | | 12-in-1: Multi-task vision and language representation learning | scholar | 2020 | multimodal encoder | combined embeddings | COCO, Flickr30k | qa/image-text retrieval | image + text | | Two-stream convolutional networks for action recognition in videos | scholar | 2014 | 1 encoder per modality | CNN based audio, text encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + optical flow |

asynchronous multimodal architectures

| name | paper | year | model | features | datasets | tasks | modalities | |--- |--- |--- |--- |--- |--- |--- |--- | | OmniMAE: Single Model Masked Pretraining on Images and Videos | scholar | 2022 | transformer with 1 cls. head | ViT-like image/video enc. patch embeddings | ImageNet, SSv2 | video/action classification | image + video | | OMNIVORE: A Single Model for Many Visual Modalities | scholar | 2022 | transformer with 3 cls. heads | ViT-like image/video enc. patch embeddings | ImageNet, Kinetics, SSv2, SUN RGB-D | image cls., action recog., depth est. | image + video + depth map | | Polyvit: Co-training vision transformers on images, videos and audio | scholar | 2021 | transformer with 9 cls. heads | ViT-like image/video/audio enc. embeddings | ImageNet, CIFAR, Kinetics, Moments in Time, AudioSet, VGGSound | image cls., video cls., audio cls. | image + video + audio |

action recognition

with transformers

| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | Frozen CLIP Models are Efficient Video Learners | scholar | 2022 | transformer with 1 cls head | CLIP image embeddings | ImageNet, Kinetics, SSv2 | action recognition | | Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training | scholar | 2022 | transformer with 1 cls head | ViT-like video enc. patch embeddings | Kinetics, SSv2 | action recognition | | Bevt: Bert pretraining of video transformers | scholar | 2022 | encoder-decoder transformer | VideoSwin image/video enc. embeddings | Kinetics, SSv2 | action recognition | | Video swin transformer | scholar | 2022 | Swin trans. with cls.head | Swin video enc. embeddings | Kinetics, SSv2 | action recognition | | Is space-time attention all you need for video understanding? | scholar | 2021 | transformer with cls. head | ViT-like video enc. patch embeddings | Kinetics, SSv2 | action recognition |

with 3D CNNs

| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | X3d: Expanding architectures for efficient video recognition | scholar | 2020 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, SSv2 | action recognition | | Slowfast networks for video recognition | scholar | 2019 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, SSv2 | action recognition | | A closer look at spatiotemporal convolutions for action recognition (R2+1D) | scholar | 2018 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, HMDB-51, UCF-101 | action recognition | | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) | scholar | 2017 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, HMDB-51, UCF-101 | action recognition |

contrastive representation learning

| name | paper | date | |--- |--- |--- | | Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text | scholar | 2021 | | Supervised contrastive learning | scholar | 2020 |

review papers

| name | paper | date | |--- |--- |--- | | Machine Learning Models for Content Classification in Film Censorship and Rating | pdf | 2022 | | A survey of artificial intelligence strategies for automatic detection of sexually explicit videos | scholar | 2022 | | A survey on video content rating: taxonomy, challenges and open issues | pdf | 2021 | | Multimodal Learning with Transformers: A Survey | scholar | 2022 | | A Survey Paper on Movie Trailer Genre Detection | scholar | 2020 |

tools

| name | url | description | |--- |--- |--- | | safetext | github | multilingual swear word detection and filtering | | PySceneDetect | github | Python and OpenCV-based scene cut/transition detection program & library | | LAION safety toolkit | github | NSFW detector trained on LAION dataset | | pysrt | github | Python parser for SubRip (srt) files | | ffsubsync | github | Automagically synchronize subtitles with video. | | MoviePy | github | Video editing with Python |

Owner

  • Name: fatih akyon
  • Login: fcakyon
  • Kind: user
  • Location: Ankara, Turkiye
  • Company: @viddexa @ultralytics

helping AI's to understand videos at @ultralytics & @viddexa

GitHub Events

Total
  • Watch event: 57
  • Push event: 2
  • Fork event: 4
Last Year
  • Watch event: 57
  • Push event: 2
  • Fork event: 4

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 45
  • Total Committers: 2
  • Avg Commits per committer: 22.5
  • Development Distribution Score (DDS): 0.267
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
fcakyon f****n@g****m 33
fatih 3****n 12

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.33
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • fcakyon (4)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/action.yml actions