https://github.com/0xibra/python-downloader-light
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: 0xIbra
- Language: Python
- Default Branch: master
- Size: 33.2 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Downloader light
Lightweight python library that let's you download and process files concurrently.
This package was developed to allows serverless deployment.
Dependencies
Installed automatically with pip - requests - pysftp
Installation
bash
pip install downloader-light
Usage examples
Download and upload files to AWS S3 For this to work, AWS CLI must be configured ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter
queue = [ { 'url': 'https://www.example.com/path/to/image.jpg', # Required 'destination': 'some/key/image.jpg' # S3 key - Required },{ 'url': 'https://www.example.com/path/to/image2.jpg', 'destination': 'some/key/image2.jpg' } ]
downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, # If true, uploads files to images to S3 with multithreading stateless=False # If set to False, it generates and stores md5 hashes of files in a file stateid='fluxstates' # name of the file where hashes will be stored (states.txt) not required bulksize=200 # Number of concurrent downloads ) downloader.process(queue) stats = downloader.get_stats() # Returns a dict with information about the process ```
Download files with states
Loading states can be useful if you don't want to re-download the same file twice. ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter
queue = [ ... ]
downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, stateless=False, state_id='filename' )
You can add a callback function if needed
This function will be called after each bulk is processed
def callback(responses): # response: { # 'destination': destination of the file can be local or can be S3 key, # 'url': URL from where the file was downloaded, # 'httpcode': HTTP code returned by the server, # 'status': True|False, # 'content-type': Mime type of the downloaded resource Example: image/jpeg # } # responses: response[]
pass # Your logic
downloader.set_callback(callback)
downloader.loadstates('filename') # This will load states from "filename.txt" downloader.process(queue) stats = downloader.getstats() # Statistics ```
ElasticDownloader
Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.
Examples
Downloading file from FTP
```python from blackfeed.elasticdownloader import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader() res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
.download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
bash
/tmp/myfile.csv
```
Retrieving binary content of file from FTP
```python from blackfeed.elasticdownloader import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader() res = retriever.retrieve(uri) # Return type: io.BytesIO | False
with open('/tmp/myfile.csv', 'wb') as f: f.write(res.getvalue()) ``` ElasticDownloader can handle FTP, SFTP and HTTP URIs automatically. Use the method download to download file locally and use the retrieve method to get the binary content of a file.
Owner
- Name: Ibra
- Login: 0xIbra
- Kind: user
- Location: Toulouse, France
- Company: Digital Dealer Factory
- Website: https://www.ibragim.fr
- Twitter: ibra_akv
- Repositories: 4
- Profile: https://github.com/0xIbra
Just another guy who's passionate and curious about tech, a guy who likes to learn by creating something of use.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0