https://github.com/cortext/cortext-pytheas-youtube

webtool to download data(json) from youtube(free api key needed)

https://github.com/cortext/cortext-pytheas-youtube

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

youtube
Last synced: 6 months ago · JSON representation

Repository

webtool to download data(json) from youtube(free api key needed)

Basic Info
  • Host: GitHub
  • Owner: cortext
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 599 KB
Statistics
  • Stars: 1
  • Watchers: 17
  • Forks: 1
  • Open Issues: 4
  • Releases: 0
Archived
Topics
youtube
Created about 9 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

ARCHIVED moved to https://gitlab.com/cortext/pytheas-youtube

Pytheas from CorTexT

Pytheas is a webtool used to download youtube data from the latest api version (v3)

pytheas.cortext.net

YouTube documentation : developers.google.com/youtube/v3/docs

  1. Installation
  2. Workflow
  3. User guides and documentation

Objectives

  • Explore YouTube from a "data point of view"
  • Export requested data
  • Make data analyzable by CorTexT platform

Features

  • Get videos, playlists, channels and search methods as queries
  • Get Comments, captions, metrics and related videos from those queries
  • Explore, manage and download it as JSON files

Requirements

  • python3 as main language
  • mongodb for database
  • docker/docker-compose : container-app

File organisation

bash Pytheas/ ├── doc/ ├── data/ ├── logs/ └── **activity_log.json** ├── scripts/ ├── config/ └── **config.json** ├── restapp/ ├── webapp/ ├── worker/ └── **docker-compose.yml**

Installation

It is higly recommanded to instal and use it via Docker but it is still possible to install it as separated python processes (see python environment ) .

Configuration file

You can here manage your paths, db name, machines names and port, fixe api_key, modify statue for debug and oauth.

  • DATA_DIR and LOG_DIR need each a separated directory (fixed by default but can be moved)
  • PORT, MONGO, REST and WORKER need each an assignated port number
  • api_key : if you want to overide all query with an api key
  • apikeytest: api key only for trying form on homepage
  • oauth_status : True or False (deactive it for being on your own)
  • debug_level : True or False

A normally conf/conf-default.json exist and can be copied as conf/conf.json. It should looked like this :

conf/conf.json

``` json { "DATADIR": "data/", "LOGDIR": "logs/",

"PORT": 5050, "REDIRECTURI": "http://localhost:5050/auth", "GRANTHOST_URL": "https://my.own.oauth.server.com",

"MONGOHOST": "mongo", "MONGODBNAME": "youtube", "MONGO_PORT": 27017,

"RESTHOST": "restapp", "RESTPORT": 5053,

"WORKERHOST": "worker", "WORKERPORT": 5003,

"apikey": "", "apikeytest": "", "oauthstatus" : "True", "debug_level": "False" } ```

Dockers

On production server docker-compose file will follow this:

  • webapp: front interface. Give orders that can be threaded to restapp
  • restapp: make interface between webapp and restapp. Will be also used for an external opening
  • worker: used to register all queries to database incoming from restapp
  • mongodb: main database
  • mongodbclient: client used to access db from http

docker-compose.yaml

yaml version: '3' services: ## Back # mongo server mongo: container_name: py_mongoserver hostname: mongo image: "mongo:3.4" restart: always command: mongod ports: - "27017:27017" volumes: - './data/mongo:/data/db' network_mode: bridge # restapp restapp: container_name: py_restapp hostname: restapp build: ./restapp/ network_mode: bridge restart: always depends_on: - mongo links: - mongo - worker ports: - "5002:5002" volumes: - './restapp:/opt/pytheas_rest' - './conf:/opt/pytheas_rest/conf' - './logs:/opt/pytheas_rest/logs' # worker: worker: container_name: py_worker hostname: worker build: ./worker/ network_mode: bridge restart: always ports: - "5003:5003" depends_on: - mongo links: - mongo volumes: - './worker:/opt/pytheas_worker' - './conf:/opt/pytheas_worker/conf' - './logs:/opt/pytheas_worker/logs' ## Front # mongo client mongoclient: container_name: py_mongoclient image: "mongoclient/mongoclient" restart: always ports: - "3000:3000" depends_on: - mongo links: - mongo # wepapp webapp: container_name: py_webapp hostname: webapp build: ./webapp/ network_mode: bridge restart: always ports: - "5000:5000" depends_on: - restapp links: - restapp - mongo volumes: - './webapp/:/opt/pytheas_webapp' - './data:/opt/pytheas_webapp/data' - './logs:/opt/pytheas_webapp/logs' - './conf:/opt/pytheas_webapp/conf'

Python environment

For this variant please be sure bash cd pytheas-youtube virtualenv env3 -p python3 source ./env3/bin/activate Then from two terminal and for each docker machine in separated terminal(webapp, restapp and worker) : bash pip install -R requirements.txt python main.py

First deployment

From cloned repository (/pytheas-youtube) : bash docker-compose build docker-compose start

You can also watch logs mannually first before: bash docker-compose up

Workflow

Update

In repository just git pull (machine have normally auto-reload): bash git pull

Mandatory update

If configuration file, networks or dockers settings modified you have to rebuild: bash git pull docker-compose stop docker-compose build docker-compose start

User guides and documentation

Other helping ressources can be found on : - user guide - medium tutorial from @BerthaBrenes - developer documentation

To do

  • [ ] threading management
  • [ ] continue refactoring : verify each class from each machine (see youtube.py on webapp and worker)
  • [ ] integrate external scripts from /script
  • [ ] integrate doc in markdown from /doc
  • [ ] better errorhandling (distinguish http error than each machine errors) and management
  • [ ] integrate api openSpec and swagger file (see rest.py path)
  • [ ] continue to integrate methods : channel as list (for description field)
  • [ ] new page to associate metrics, stats and other methods to analyze query (or set of queries?)
  • [ ] script to manage conf file and docker port/location/name...
  • [ ] combined to swagger file -> REST methods
  • [ ] work UX

Owner

  • Name: CorTexT Platform
  • Login: cortext
  • Kind: organization
  • Location: Marne La Vallée, France

Digital Platform for social studies of science, technology and digital societies

GitHub Events

Total
Last Year