https://github.com/0xibra/python-downloader-light

https://github.com/0xibra/python-downloader-light

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: 0xIbra
  • Language: Python
  • Default Branch: master
  • Size: 33.2 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 5 years ago · Last pushed about 5 years ago
Metadata Files
Readme

README.md

Downloader light

Lightweight python library that let's you download and process files concurrently.
This package was developed to allows serverless deployment.

Dependencies

Installed automatically with pip - requests - pysftp

Installation

bash pip install downloader-light

Usage examples

Download and upload files to AWS S3 For this to work, AWS CLI must be configured ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter

queue = [ { 'url': 'https://www.example.com/path/to/image.jpg', # Required 'destination': 'some/key/image.jpg' # S3 key - Required },{ 'url': 'https://www.example.com/path/to/image2.jpg', 'destination': 'some/key/image2.jpg' } ]

downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, # If true, uploads files to images to S3 with multithreading stateless=False # If set to False, it generates and stores md5 hashes of files in a file stateid='fluxstates' # name of the file where hashes will be stored (states.txt) not required bulksize=200 # Number of concurrent downloads ) downloader.process(queue) stats = downloader.get_stats() # Returns a dict with information about the process ```

Download files with states

Loading states can be useful if you don't want to re-download the same file twice. ```python from blackfeed.downloader import Downloader from blackfeed.adapter.s3 import S3Adapter

queue = [ ... ]

downloader = Downloader( S3Adapter(bucket='bucketname'), multi=True, stateless=False, state_id='filename' )

You can add a callback function if needed

This function will be called after each bulk is processed

def callback(responses): # response: { # 'destination': destination of the file can be local or can be S3 key, # 'url': URL from where the file was downloaded, # 'httpcode': HTTP code returned by the server, # 'status': True|False, # 'content-type': Mime type of the downloaded resource Example: image/jpeg # } # responses: response[]

pass # Your logic

downloader.set_callback(callback)

downloader.loadstates('filename') # This will load states from "filename.txt" downloader.process(queue) stats = downloader.getstats() # Statistics ```

ElasticDownloader

Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.

Examples

Downloading file from FTP

```python from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader() res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional

.download() function returns False if there was an error or return the local path of the downloaded file if it was a success.

print(res) bash /tmp/myfile.csv ```

Retrieving binary content of file from FTP

```python from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader() res = retriever.retrieve(uri) # Return type: io.BytesIO | False

with open('/tmp/myfile.csv', 'wb') as f: f.write(res.getvalue()) ``` ElasticDownloader can handle FTP, SFTP and HTTP URIs automatically. Use the method download to download file locally and use the retrieve method to get the binary content of a file.

Owner

  • Name: Ibra
  • Login: 0xIbra
  • Kind: user
  • Location: Toulouse, France
  • Company: Digital Dealer Factory

Just another guy who's passionate and curious about tech, a guy who likes to learn by creating something of use.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels