vcc

Vietnamese Conceptual Caption

https://github.com/dinhanhx/vcc

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.5%) to scientific vocabulary

Keywords

computer-vision dataset image-captioning nlp python vietnamese
Last synced: 4 months ago · JSON representation ·

Repository

Vietnamese Conceptual Caption

Basic Info
  • Host: GitHub
  • Owner: dinhanhx
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 26.4 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
computer-vision dataset image-captioning nlp python vietnamese
Created almost 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Vietnamese Conceptual Caption

  • VCC: Vietnamese Conceptual Caption
  • VCC++: VCC but with long description aka photo stories

Citation

Check out CITATION file

Setup

pip install -e .

Setup Selenium

python vcc/setup_webdriver.py

Craw data

In other words, you build data from actual websites

VNANET

[VCC]

Source

python vcc/vnanet_download_html.py python vcc/vnanet_build_data.py make-article-list python vcc/vnanet_build_data.py build-data

8431 images with captions from 633 articles from 29/12/2022 to 26/01/2005

VNEXPRESS

Inforgraphics

[VCC]

Source

python vcc/vnexpress_inforgraphics_build_data.py make-article-list python vcc/vnexpress_inforgraphics_build_data.py build-data python vcc/vnexpress_inforgraphics_build_data.py clean-data

331 images with captions from 499 articles from 21/1/2023 14/12/2021 (there are articles with sole videos so the number of images is less than the number of articles)

Anh

[VCC++]

Source

python vcc/vnexpress_anh_build_data.py make-photo-story-list python vcc/vnexpress_anh_build_data.py build-data

551 photo stories from 555 articles from 31/1/2023 to 18/11/2022 (videos and animated pictures such as gif or apng are not crawled) (there are 4 articles that don't follow the usual template)

Dantri

[VCC] [VCC++]

Source

python vcc/dantri_build_data.py make-article-list python vcc/dantri_build_data.py build-data

64 (photo_story) + 53 (dmagazine) photo stories and 1431 images with captions from 600 articles from 25/01/2023 to 17/01/2020

⚠ NOTE: - there are gifs, apng - there are images WITHOUT captions - sometimes in a certain article, the author uses 2 images with 1 caption/description hence we only take the first one in that pair.

Owner

  • Name: dinhanhx
  • Login: dinhanhx
  • Kind: user
  • Location: Hanoi, Vietnam

A Python dev :/

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Vietnamese Conceptual Caption
message: >-
  If you use this dataset, please cite it using the metadata
  from this file.
type: dataset
authors:
  - given-names: dinhanhx
    email: dinhanhx@gmail.com
repository-code: 'https://github.com/dinhanhx/vcc'
keywords:
  - python
  - image-captioning
  - computer-vision
  - natural-language-processing
  - vietnamese
license: MIT

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 30
  • Total Committers: 1
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Dinh Anh d****x@g****m 30

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • beautifulsoup4 ^4.11.1
  • black ^22.12.0
  • click ^8.1.3
  • dataclass-wizard ^0.22.2
  • flake8 ^6.0.0
  • lxml ^4.9.2
  • python ^3.9
  • selenium ^4.8.0
  • urllib3 ^1.26.14
  • webdriver-manager ^3.8.5