Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.5%) to scientific vocabulary
Keywords
Repository
Vietnamese Conceptual Caption
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Vietnamese Conceptual Caption
- VCC: Vietnamese Conceptual Caption
- VCC++: VCC but with long description aka photo stories
Citation
Check out CITATION file
Setup
pip install -e .
Setup Selenium
python vcc/setup_webdriver.py
Craw data
In other words, you build data from actual websites
VNANET
[VCC]
python vcc/vnanet_download_html.py
python vcc/vnanet_build_data.py make-article-list
python vcc/vnanet_build_data.py build-data
8431 images with captions from 633 articles from 29/12/2022 to 26/01/2005
VNEXPRESS
Inforgraphics
[VCC]
python vcc/vnexpress_inforgraphics_build_data.py make-article-list
python vcc/vnexpress_inforgraphics_build_data.py build-data
python vcc/vnexpress_inforgraphics_build_data.py clean-data
331 images with captions from 499 articles from 21/1/2023 14/12/2021 (there are articles with sole videos so the number of images is less than the number of articles)
Anh
[VCC++]
python vcc/vnexpress_anh_build_data.py make-photo-story-list
python vcc/vnexpress_anh_build_data.py build-data
551 photo stories from 555 articles from 31/1/2023 to 18/11/2022 (videos and animated pictures such as gif or apng are not crawled) (there are 4 articles that don't follow the usual template)
Dantri
[VCC] [VCC++]
python vcc/dantri_build_data.py make-article-list
python vcc/dantri_build_data.py build-data
64 (photo_story) + 53 (dmagazine) photo stories and 1431 images with captions from 600 articles from 25/01/2023 to 17/01/2020
⚠ NOTE: - there are gifs, apng - there are images WITHOUT captions - sometimes in a certain article, the author uses 2 images with 1 caption/description hence we only take the first one in that pair.
Owner
- Name: dinhanhx
- Login: dinhanhx
- Kind: user
- Location: Hanoi, Vietnam
- Repositories: 10
- Profile: https://github.com/dinhanhx
A Python dev :/
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Vietnamese Conceptual Caption
message: >-
If you use this dataset, please cite it using the metadata
from this file.
type: dataset
authors:
- given-names: dinhanhx
email: dinhanhx@gmail.com
repository-code: 'https://github.com/dinhanhx/vcc'
keywords:
- python
- image-captioning
- computer-vision
- natural-language-processing
- vietnamese
license: MIT
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- beautifulsoup4 ^4.11.1
- black ^22.12.0
- click ^8.1.3
- dataclass-wizard ^0.22.2
- flake8 ^6.0.0
- lxml ^4.9.2
- python ^3.9
- selenium ^4.8.0
- urllib3 ^1.26.14
- webdriver-manager ^3.8.5