https://github.com/darixsamani/pdfdrive
I'm building this project to enhance my python skills after a long time without coding
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.2%) to scientific vocabulary
Keywords
Repository
I'm building this project to enhance my python skills after a long time without coding
Basic Info
- Host: GitHub
- Owner: darixsamani
- Language: Python
- Default Branch: main
- Homepage: https://hub.docker.com/r/darixsamani/pdfdrive
- Size: 1.07 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
pdfdrive
I built this project to enhance my Python skills after a long of time without coding
what does it do
it's a web scraper that collects information on the pdfdrive.com site and then saves it in a file and in a mongodb database
How to install
- Install requirements ``` pip3 install poetry
2. Laucch Spider
Before changing `.env` to your URI MongoDB and Redis
poetry install && cpdfdrive && poetry run scrapy crwal pdfdrive
```
Run with docker
docker pull darixsamani/pdfdrive
docker run -it -e MONGO_URI="mongodb://localhost" -e MONGO_DATABASE="pdfdrive" -e REDIS_HOST="localhost" -e REDIS_PORT=6379 -e REDIS_PASSWORD="" darixsamani/pdfdrive
MongoDB Screen

Owner
- Name: Darix SAMANI SIEWE
- Login: darixsamani
- Kind: user
- Location: Douala, Cameroon
- Company: @DataTouchAnalytics, @hoozonsarl
- Website: https://linktr.ee/darixsamani
- Twitter: darixsamani1
- Repositories: 39
- Profile: https://github.com/darixsamani
Software Engineer
GitHub Events
Total
Last Year
Dependencies
- actions/checkout v3 composite
- python 3.8 build
- mongo latest
- redis latest
- Automat ==22.10.0
- Protego ==0.2.1
- PyDispatcher ==2.0.7
- Scrapy ==2.10.0
- Twisted ==22.10.0
- async-timeout ==4.0.3
- attrs ==23.1.0
- beautifulsoup4 ==4.12.2
- bs4 ==0.0.1
- certifi ==2023.7.22
- charset-normalizer ==3.2.0
- constantly ==15.1.0
- cryptography ==41.0.3
- cssselect ==1.2.0
- dnspython ==2.4.1
- filelock ==3.12.2
- hyperlink ==21.0.0
- idna ==3.4
- incremental ==22.10.0
- itemadapter ==0.8.0
- itemloaders ==1.1.0
- jmespath ==1.0.1
- lxml ==4.9.3
- packaging ==23.1
- parsel ==1.8.1
- pyOpenSSL ==23.2.0
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pycparser ==2.21
- pymongo ==4.4.1
- python-dotenv ==1.0.0
- queuelib ==1.6.2
- redis ==5.0.0
- requests ==2.31.0
- requests-file ==1.5.1
- service-identity ==23.1.0
- six ==1.16.0
- soupsieve ==2.4.1
- tldextract ==3.4.4
- typing_extensions ==4.7.1
- urllib3 ==2.0.4
- w3lib ==2.1.2
- zope.interface ==6.0