https://github.com/0xibra/python-downloader-light

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: 0xIbra
Language: Python
Default Branch: master
Size: 33.2 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 5 years ago · Last pushed about 5 years ago

Metadata Files

Readme

Downloader light

Lightweight python library that let's you download and process files concurrently.
This package was developed to allows serverless deployment.

Dependencies

Installed automatically with pip - requests - pysftp

Installation
Usage examples

Installation

bash pip install downloader-light

Usage examples

Download and upload files to AWS S3 For this to work, AWS CLI must be configured ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter

queue = [ { 'url': 'https://www.example.com/path/to/image.jpg', # Required 'destination': 'some/key/image.jpg' # S3 key - Required },{ 'url': 'https://www.example.com/path/to/image2.jpg', 'destination': 'some/key/image2.jpg' } ]

downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, # If true, uploads files to images to S3 with multithreading stateless=False # If set to False, it generates and stores md5 hashes of files in a file stateid='fluxstates' # name of the file where hashes will be stored (states.txt) not required bulksize=200 # Number of concurrent downloads ) downloader.process(queue) stats = downloader.get_stats() # Returns a dict with information about the process ```

Download files with states

Loading states can be useful if you don't want to re-download the same file twice. ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter

queue = [ ... ]

downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, stateless=False, state_id='filename' )

You can add a callback function if needed

This function will be called after each bulk is processed

def callback(responses): # response: { # 'destination': destination of the file can be local or can be S3 key, # 'url': URL from where the file was downloaded, # 'httpcode': HTTP code returned by the server, # 'status': True|False, # 'content-type': Mime type of the downloaded resource Example: image/jpeg # } # responses: response[]

pass # Your logic

downloader.set_callback(callback)

downloader.loadstates('filename') # This will load states from "filename.txt" downloader.process(queue) stats = downloader.getstats() # Statistics ```

ElasticDownloader

Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.

Examples

Downloading file from FTP

```python from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader() res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional

.download() function returns False if there was an error or return the local path of the downloaded file if it was a success.

print(res) bash /tmp/myfile.csv ```

Retrieving binary content of file from FTP

```python from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader() res = retriever.retrieve(uri) # Return type: io.BytesIO | False

with open('/tmp/myfile.csv', 'wb') as f: f.write(res.getvalue()) ``` ElasticDownloader can handle FTP, SFTP and HTTP URIs automatically. Use the method download to download file locally and use the retrieve method to get the binary content of a file.

Owner

Name: Ibra
Login: 0xIbra
Kind: user
Location: Toulouse, France
Company: Digital Dealer Factory

Website: https://www.ibragim.fr
Twitter: ibra_akv
Repositories: 4
Profile: https://github.com/0xIbra

Just another guy who's passionate and curious about tech, a guy who likes to learn by creating something of use.

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science