data_crawling

cào dữ liệu về hoi

https://github.com/domanhquang/data_crawling

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

cào dữ liệu về hoi

Basic Info
  • Host: GitHub
  • Owner: DoManhQuang
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 451 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

Cào dữ liệu về thui

Cấu trúc thư mục

└───source
    ├───database
    │   └───mongodb
    ├───ecom_api
    ├───message_queue
    └───restful_api

Mô tả chức năng

  • source : Lưu trữ code của project
  • source/database : Tạo kết nối đến các CSDL như mysql, mongodb, ...
  • source/ecom_api : Phân tích các API từ web để thực hiện các chức năng crawl dữ liệu
  • source/message_queue : Tạo kết nối đến các message queue như redis, kafka, ...
  • source/restfull_api : Viết API để query dữ liệu, monitor, backups, ...

Cài đặt Env

bash pip install -r requirements.txt

Hội nghị

DOI: End-to-End System For Data Crawling, Monitoring, And Analyzation Of E-Commerce Websites

Paper: End-to-End System For Data Crawling, Monitoring, And Analyzation Of E-Commerce Websites Authors: Manh Quang Do, Thi Lan Nguyen, Dinh Duy Vu, Xuan Duc Tran, Thi Quynh Nguyen, Ba Nghien Nguyen, Van Tinh Nguyen and Ngoc Anh Nguyen

Cite

@InProceedings{10.1007/978-981-96-4282-3_18, author="Nguyen, Thanh Long and Do, Manh Quang and Nguyen, Ba Nghien", editor="Buntine, Wray and Fjeld, Morten and Tran, Truyen and Tran, Minh-Triet and Huynh Thi Thanh, Binh and Miyoshi, Takumi", title="MEPC: Multi-level Product Category Recognition Image Dataset", booktitle="Information and Communication Technology", year="2025", publisher="Springer Nature Singapore", address="Singapore", pages="216--225", abstract="Multi-level product category prediction is a problem for businesses providing online retail sector systems. Accurate Multi-level prediction supports the system in avoiding the need for sellers to fill in product category information, saving time and reducing the cost of listing products online. This is an open research problem, which always attracts researchers. Deep learning techniques have shown promising results for category recognition problems. A neat and clean dataset is an elementary requirement for building accurate and robust deep-learning models for category prediction. This article introduces a new image dataset of the multi-level product, called MEPC. MEPC dataset has +164.000 images in the processed format available in the dataset. We evaluate the MEPC dataset with popular deep learning models, benchmark results in a top-1 accuracy score of 92.055{\%} with 10 classes and a top-5 accuracy score of 57.36{\%} with 1000 classes. The proposed dataset is good for training, validation, and testing for hierarchical image classification to improve predict multi-level categories in the online retail sector systems. Data and code will be released at https://huggingface.co/datasets/sherlockvn/MEPC.", isbn="978-981-96-4282-3" }

Owner

  • Name: Đỗ Mạnh Quang
  • Login: DoManhQuang
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Manh Quang Do"
  given-names: "et al"
  orcid: "https://orcid.org/0009-0005-9995-799X"
title: "End-to-End System For Data Crawling, Monitoring, And Analyzation Of E-Commerce Websites"
version: 1.0.0
date-released: 2024-11-07
url: "https://github.com/DoManhQuang/data_crawling"

GitHub Events

Total
  • Watch event: 1
  • Push event: 5
Last Year
  • Watch event: 1
  • Push event: 5

Dependencies

requirements.txt pypi
  • Flask *
  • Flask-RESTful *
  • numpy *
  • pandas *
  • pymongo *
  • tqdm *
sample/_requirements.txt pypi