content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, researchgate.net, scholar.google, springer.com, ieee.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.0%) to scientific vocabulary
Keywords
Repository
Deep learning based content moderation from text, audio, video & image input modalities.
Basic Info
- Host: GitHub
- Owner: fcakyon
- License: mit
- Default Branch: main
- Homepage: https://arxiv.org/abs/2212.04533
- Size: 22.5 KB
Statistics
- Stars: 358
- Watchers: 4
- Forks: 21
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
deep-learning-content-moderation
Various sources for deep learning based content moderation, sensitive content detection, scene genre classification, nudity detection, violence detection, substance detection from text, audio, video & image input modalities.
citation
If you find this source useful, please consider citing it in your work as:
bib
@INPROCEEDINGS{10193621,
author={Akyon, Fatih Cagatay and Temizel, Alptekin},
booktitle={2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)},
title={State-of-the-Art in Nudity Classification: A Comparative Analysis},
year={2023},
pages={1-5},
keywords={Analytical models;Convolution;Conferences;Transfer learning;Benchmark testing;Transformers;Safety;content moderation;nudity detection;safety;transformers},
doi={10.1109/ICASSPW59220.2023.10193621}}
bib
@article{akyon2022contentmoderation,
title={Deep Architectures for Content Moderation and Movie Content Rating},
author={Akyon, Fatih Cagatay and Temizel, Alptekin},
journal={arXiv},
doi={https://doi.org/10.48550/arXiv.2212.04533},
year={2022}
}
table of contents
- datasets
- techniques
- tools
datasets
movie and content moderation datasets
| name | paper | year | url | input modality | task | labels | |--- |--- |--- |--- |--- |--- |--- | | LSPD | pdf | 2022 | page | image, video | image/video classification, instance segmentation | porn, normal, sexy, hentai, drawings, female/male genital, female breast, anus | | MM-Trailer | pdf | 2021 | page | video | video classification | age rating | | Movienet | scholar | 2021 | page | image, video, text | object detection, video classification | scene level actions and places, character bboxes | | Movie script severity dataset | pdf | 2021 | github | text | text classification | frightening, mild, moderate, severe | | LVU | pdf | 2021 | page | video | video classification | relationship, place, like ration, view count, genre, writer, year per movie scene | | Violence detection dataset | scholar | 2020 | github | video | video classification | violent, not-violent | | Movie script dataset | pdf | 2019 | github | text | text classification | violent or not | | Nudenet | github | 2019 | archive.org | image | image classification | nude or not | | Adult content dataset | pdf | 2017 | contact | image | image classification | nude or not | | Substance use dataset | pdf | 2017 | first author | image | image classification | drug related or not | | NDPI2k dataset | pdf | 2016 | contact | video | video classification | porn or not | | Violent Scenes Dataset | springer | 2014 | page | video | video classification | blood, fire, gun, gore, fight | | VSD2014 | pdf | 2014 | download | video | video classification | blood, fire, gun, gore, fight | | AIIA-PID4 | pdf | 2013 | - | image | image classification | bikini, porn, skin, non-skin | | NDPI800 dataset | scholar | 2013 | page | video | video classification | porn or not | | HMDB-51 | scholar | 2011 | page | video |video classification | smoke, drink |
techniques
sensitive content detection
movie content rating
| name | paper | year | model | features | datasets | tasks | context | |--- |--- |--- |--- |--- |--- |--- |--- | | Movies2Scenes: Learning Scene Representations Using Movie Similarities | scholar | 2022 | ViT-like video encoder + MLP | ViT-like video encoder embedings | Private, Movienet, LVU | movie scene representation learning, video classifcation (sex, violence, drug-use) | movie scene content rating | | Detection and Classification of Sensitive Audio-Visual Content for Automated Film Censorship and Rating | pdf | 2022 | CNN + GRU + MLP | CNN embeddings from video frames | Violence detection dataset | violent/non-violent classification from videos | movie scene content rating | | Automatic parental guide ratings for short movies | page | 2021 | separate model for each task: concat + LSTM, object detector, one-class CNN embeddings | video frame pixel values, image embeddings, text | Nudenet, private dataset | profanity, violence, nudity, drug classification | movie content rating | | From None to Severe: Predicting Severity in Movie Scripts | scholar | 2021 | multi-task pairwise ranking-classification network | GloVe, Bert and TextCNN text embeddings | Movie script severity dataset | rating classifcation (frightening, mild, moderate, severe) | movie content rating | | A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers | scholar | 2021 | multi-modal + multi output concat+MLP | CNN+LSTM video features, Bert and DeepMoji text embeddings, MFCC audio features | MM-Trailer | rating classifcation (red, yellow, green) | movie trailer content rating | | Automatic Parental Guide Scene Classification Menggunakan Metode Deep Convolutional Neural Network Dan Lstm | scholar | 2020 | 3 CNN model for 3 modality, multi-label dataset | CNN video and audio embeddings, LSTM text (subitle) embeddings | private dataset | gore, nudity, drug, profanity classification from video and subtitle | movie scene content rating | | Multimodal data fusion for sensitive scene localization | scholar | 2019 | meta-learning with Naive Bayes, SVM | MFCC and prosodic features from audio, HOG and TRoF features from images | Pornography-2k dataset, VSD2014 | violent and pornographic scene localization from video | movie scene content rating | | A Deep Learning approach for the Motion Picture Content Rating | scholar | 2019 | MLP + rule-based decision | InceptionV3 image embeddings | Violent Scenes Dataset, private dataset | violence (shooting, blood, fire, weapon) classification from video | movie scene content rating | | Hybrid System for MPAA Ratings of Movie Clips Using Support Vector Machine | springer | 2019 | SVM | DCT features from image | private dataset | movie content rating classification from images | movie content rating | | Inappropriate scene detection in a video stream | page | 2017 | SVM classifier + Lenet image classifier + rules-based decision | HoG and CNN features for image | private dataset | image classification: no/mild/high violence, safe/unsafe/pornoghraphy | movie frame content rating |
content moderation
| name | paper | year | model | features | datasets | tasks | context | |--- |--- |--- |--- |--- |--- |--- |--- | | State-of-the-Art in Nudity Classification: A Comparative Analysis | ieee | 2023 | CNN, Transformers | EfficientNet, ViT, ConvNeXT image embeddings | LSPD, Nudenet, NDPI2k | nudity classification from images | general content moderation | | Reliable Decision from Multiple Subtasks through Threshold Optimization: Content Moderation in the Wild | scholar | 2022 | novel threshold optimization tech. (TruSThresh) | prediction scores | UnSmile (Korean hatespeech dataset) | optimum threshold prediction | social media content moderation | | On-Device Content Moderation | scholar | 2021 | mobilenet v3 + SSD object detector | mobilenet v3 image embeddings | private dataset | object detection + nudity classification from images | on-device content moderation | | Gore Classification and Censoring in Images | scholar | 2021 | ensemble of CNN + MLP | mobilenet v2, densenent, vgg16 image embeddings | private dataset | gore classification from images | general content moderation | | Automated Censoring of Cigarettes in Videos Using Deep Learning Techniques | scholar | 2020 | CNN + MLP | inception v3 image embeddings | private dataset | cigarette classification from video | general content moderation | | A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes | scholar | 2019 | CNN + SVM | InceptionV3 image embeddings, AudioVGG audio embeddings | private dataset | inappropriate (nudity+gore) classification from video | general video content moderation | | A baseline for NSFW video detection in e-learning environments | scholar | 2019 | concat + SVM, MLP | InceptionV3 image embeddings, AudioVGG audio embeddings | YouTube8M, NDPI, Cholec80 | nudity classification from video | e-learning content moderation | | Bringing the kid back into youtube kids: Detecting inappropriate content on video streaming platforms | scholar | 2019 | CNN + LSTM (late fusion) | CNN based encoder for image, video and audio spectrograms | private dataset | video classification: orignal, fake explicit, fake violent | social media content moderation |
movie/scene genre classification
| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | Effectively leveraging Multi-modal Features for Movie Genre Classification | scholar | 2022 | embeddings + fusion + MLP | CLIP image embeddings, PANNs audio embeddings, CLIP text embeddings | MovieNet | movie genre classification | | OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification | scholar | 2022 | embeddings + novel transformer | ResNet-18 image embeddings, ResNet-VLAD audio embeddings | TI-News | news scene segmentation/classification (studio, outdoor, interview) | | Detection of Animated Scenes Among Movie Trailers | scholar | 2022 | CNN + GRU | EfficientNet image embeddings | Private dataset | genre classification from movie trailer scenes | | A multi-label movie genre classification scheme based on the movie's subtitles | springer | 2022 | KNN | text frequency vectors | Private dataset | genre classification from movie subtitle text | | A multimodal approach for multi-label movie genre classification | scholar | 2020 | CNN + LSTM | MFCCs/SSD/LBP from audio, LBP/3DCNN from video frames, Inception-v3 from poster, TFIDF from text | Private dataset | genre classification from movie trailers | | Genre classification of movie trailers using 3d convolutional neural networks | ieee | 2020 | 3D CNN | images | Private dataset | genre classification from movie trailer scenes | | A unified framework of deep networks for genre classification using movie trailer | scholar | 2020 | CNN + LSTM | Inception V4 image embeddings | EmoGDB | genre classification from movie trailer scenes | | Towards story-based classification of movie scenes | scholar | 2020 | logistic regression | manually extracted categorical features | Flintstones Scene Dataset | scene classification (Obstacle, Midpoint, Climax of Act 1) |
multimodal architectures
synchronous multimodal architectures
| name | paper | year | model | features | datasets | tasks | modalities | |--- |--- |--- |--- |--- |--- |--- |--- | | M&M Mix: A Multimodal Multiview Transformer Ensemble | scholar | 2022 | transformer with 2 cls heads | ViT image embeddings from audio spect., frame image, optical flow | Epic-Kitchens | video/action classification | image + audio + optical flow | | MultiMAE: Multi-modal Multi-task Masked Autoencoders | scholar | 2022 | transformer with 3 decoder + cls heads | ViT-like image enc. patch embeddings (optional modalities) | ImageNet: Pseudo labeled multi-task training dataset (depth, segm) | image cs., semantic segm., depth est. | image + depth map | | Data2vec: A general framework for self-supervised learning in speech, vision and language | scholar | 2022 | single encoder | transformer based audio, text, image encoder embeddings | ImageNet, Librispeech | masked pretraining | image + audio + text | | VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text | scholar | 2022 | 1 encoder per modality | transformer based audio, text, image encoder embeddings | AudioSet, HowTo100M | pretraining + video/audio classification | image + audio + text | | Expanding Language-Image Pretrained Models for General Video Recognition | scholar | 2022 | 1 encoder per modality | transformer based video, text encoder embeddings | HMDB-51, UCF-101 | contrastive pretraining | video + text | | Audio-Visual Instance Discrimination with Cross-Modal Agreement | scholar | 2021 | 1 encoder per modality | CNN based audio, video encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + audio | | Robust Audio-Visual Instance Discrimination | scholar | 2021 | 1 encoder per modality | CNN based audio, video encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + audio | | Learning transferable visual models from natural language supervision | scholar | 2021 | 1 encoder per modality | transformer based image, text encoder embeddings | JFT-300M | contrastive pretraining | image + text | | Self-supervised multimodal versatile networks | scholar | 2020 | multiple encoders | CNN based image/audio embeddings, word2vec text embeddings | UCF101, Kinetics, AudioSet | contrastive pretraining + classification | image + audio + text | | Uniter: Universal image-text representation learning | scholar | 2020 | multimodal encoder | combined embeddings | COCO, Visual Genome, Conceptual Captions | qa/image-text retrieval | image + text | | 12-in-1: Multi-task vision and language representation learning | scholar | 2020 | multimodal encoder | combined embeddings | COCO, Flickr30k | qa/image-text retrieval | image + text | | Two-stream convolutional networks for action recognition in videos | scholar | 2014 | 1 encoder per modality | CNN based audio, text encoder embeddings | HMDB-51, UCF-101 | video/audio classification | video + optical flow |
asynchronous multimodal architectures
| name | paper | year | model | features | datasets | tasks | modalities | |--- |--- |--- |--- |--- |--- |--- |--- | | OmniMAE: Single Model Masked Pretraining on Images and Videos | scholar | 2022 | transformer with 1 cls. head | ViT-like image/video enc. patch embeddings | ImageNet, SSv2 | video/action classification | image + video | | OMNIVORE: A Single Model for Many Visual Modalities | scholar | 2022 | transformer with 3 cls. heads | ViT-like image/video enc. patch embeddings | ImageNet, Kinetics, SSv2, SUN RGB-D | image cls., action recog., depth est. | image + video + depth map | | Polyvit: Co-training vision transformers on images, videos and audio | scholar | 2021 | transformer with 9 cls. heads | ViT-like image/video/audio enc. embeddings | ImageNet, CIFAR, Kinetics, Moments in Time, AudioSet, VGGSound | image cls., video cls., audio cls. | image + video + audio |
action recognition
with transformers
| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | Frozen CLIP Models are Efficient Video Learners | scholar | 2022 | transformer with 1 cls head | CLIP image embeddings | ImageNet, Kinetics, SSv2 | action recognition | | Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training | scholar | 2022 | transformer with 1 cls head | ViT-like video enc. patch embeddings | Kinetics, SSv2 | action recognition | | Bevt: Bert pretraining of video transformers | scholar | 2022 | encoder-decoder transformer | VideoSwin image/video enc. embeddings | Kinetics, SSv2 | action recognition | | Video swin transformer | scholar | 2022 | Swin trans. with cls.head | Swin video enc. embeddings | Kinetics, SSv2 | action recognition | | Is space-time attention all you need for video understanding? | scholar | 2021 | transformer with cls. head | ViT-like video enc. patch embeddings | Kinetics, SSv2 | action recognition |
with 3D CNNs
| name | paper | year | model | features | datasets | tasks | |--- |--- |--- |--- |--- |--- |--- | | X3d: Expanding architectures for efficient video recognition | scholar | 2020 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, SSv2 | action recognition | | Slowfast networks for video recognition | scholar | 2019 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, SSv2 | action recognition | | A closer look at spatiotemporal convolutions for action recognition (R2+1D) | scholar | 2018 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, HMDB-51, UCF-101 | action recognition | | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (I3D) | scholar | 2017 | CNN with cls. head | 3D CNN based video enc. embeddings | Kinetics, HMDB-51, UCF-101 | action recognition |
contrastive representation learning
| name | paper | date | |--- |--- |--- | | Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text | scholar | 2021 | | Supervised contrastive learning | scholar | 2020 |
review papers
| name | paper | date | |--- |--- |--- | | Machine Learning Models for Content Classification in Film Censorship and Rating | pdf | 2022 | | A survey of artificial intelligence strategies for automatic detection of sexually explicit videos | scholar | 2022 | | A survey on video content rating: taxonomy, challenges and open issues | pdf | 2021 | | Multimodal Learning with Transformers: A Survey | scholar | 2022 | | A Survey Paper on Movie Trailer Genre Detection | scholar | 2020 |
tools
| name | url | description | |--- |--- |--- | | safetext | github | multilingual swear word detection and filtering | | PySceneDetect | github | Python and OpenCV-based scene cut/transition detection program & library | | LAION safety toolkit | github | NSFW detector trained on LAION dataset | | pysrt | github | Python parser for SubRip (srt) files | | ffsubsync | github | Automagically synchronize subtitles with video. | | MoviePy | github | Video editing with Python |
Owner
- Name: fatih akyon
- Login: fcakyon
- Kind: user
- Location: Ankara, Turkiye
- Company: @viddexa @ultralytics
- Twitter: fcakyon
- Repositories: 139
- Profile: https://github.com/fcakyon
helping AI's to understand videos at @ultralytics & @viddexa
GitHub Events
Total
- Watch event: 57
- Push event: 2
- Fork event: 4
Last Year
- Watch event: 57
- Push event: 2
- Fork event: 4
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| fcakyon | f****n@g****m | 33 |
| fatih | 3****n | 12 |
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.33
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 2.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- fcakyon (4)