https://github.com/darixsamani/pdfdrive

I'm building this project to enhance my python skills after a long time without coding

https://github.com/darixsamani/pdfdrive

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.2%) to scientific vocabulary

Keywords

beautifulsoup4 json mongodb python3 redis scrapy webscraper webscrapping
Last synced: 5 months ago · JSON representation

Repository

I'm building this project to enhance my python skills after a long time without coding

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
beautifulsoup4 json mongodb python3 redis scrapy webscraper webscrapping
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

pdfdrive

I built this project to enhance my Python skills after a long of time without coding

what does it do

it's a web scraper that collects information on the pdfdrive.com site and then saves it in a file and in a mongodb database

How to install

  1. Install requirements ``` pip3 install poetry

2. Laucch Spider Before changing `.env` to your URI MongoDB and Redis poetry install && cpdfdrive && poetry run scrapy crwal pdfdrive ```

Run with docker

docker pull darixsamani/pdfdrive docker run -it -e MONGO_URI="mongodb://localhost" -e MONGO_DATABASE="pdfdrive" -e REDIS_HOST="localhost" -e REDIS_PORT=6379 -e REDIS_PASSWORD="" darixsamani/pdfdrive

MongoDB Screen

Mongo image

Owner

  • Name: Darix SAMANI SIEWE
  • Login: darixsamani
  • Kind: user
  • Location: Douala, Cameroon
  • Company: @DataTouchAnalytics, @hoozonsarl

Software Engineer

GitHub Events

Total
Last Year

Dependencies

.github/workflows/docker-image.yml actions
  • actions/checkout v3 composite
Dockerfile docker
  • python 3.8 build
docker-compose.yml docker
  • mongo latest
  • redis latest
requirements.txt pypi
  • Automat ==22.10.0
  • Protego ==0.2.1
  • PyDispatcher ==2.0.7
  • Scrapy ==2.10.0
  • Twisted ==22.10.0
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • beautifulsoup4 ==4.12.2
  • bs4 ==0.0.1
  • certifi ==2023.7.22
  • charset-normalizer ==3.2.0
  • constantly ==15.1.0
  • cryptography ==41.0.3
  • cssselect ==1.2.0
  • dnspython ==2.4.1
  • filelock ==3.12.2
  • hyperlink ==21.0.0
  • idna ==3.4
  • incremental ==22.10.0
  • itemadapter ==0.8.0
  • itemloaders ==1.1.0
  • jmespath ==1.0.1
  • lxml ==4.9.3
  • packaging ==23.1
  • parsel ==1.8.1
  • pyOpenSSL ==23.2.0
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pycparser ==2.21
  • pymongo ==4.4.1
  • python-dotenv ==1.0.0
  • queuelib ==1.6.2
  • redis ==5.0.0
  • requests ==2.31.0
  • requests-file ==1.5.1
  • service-identity ==23.1.0
  • six ==1.16.0
  • soupsieve ==2.4.1
  • tldextract ==3.4.4
  • typing_extensions ==4.7.1
  • urllib3 ==2.0.4
  • w3lib ==2.1.2
  • zope.interface ==6.0