ENASearch

ENASearch: A Python library for interacting with ENA's API - Published in JOSS (2017)

https://github.com/bebatut/enasearch

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

api api-client ena

Scientific Fields

Biology Life Sciences - 63% confidence
Last synced: 6 months ago · JSON representation

Repository

A Python library for interacting with ENA's API

Basic Info
Statistics
  • Stars: 23
  • Watchers: 2
  • Forks: 6
  • Open Issues: 12
  • Releases: 5
Topics
api api-client ena
Created almost 9 years ago · Last pushed over 4 years ago
Metadata Files
Readme Contributing License

README.rst

ENASearch
=========

.. image:: https://travis-ci.org/bebatut/enasearch.svg?branch=master
    :target: https://travis-ci.org/bebatut/enasearch
.. image:: https://badge.fury.io/py/enasearch.svg
    :target: https://badge.fury.io/py/enasearch
.. image:: https://anaconda.org/bioconda/enasearch/badges/installer/conda.svg
    :target: https://anaconda.org/bioconda/enasearch
.. image:: https://codecov.io/gh/bebatut/enasearch/branch/master/graph/badge.svg
  :target: https://codecov.io/gh/bebatut/enasearch
.. image:: https://landscape.io/github/bebatut/enasearch/improve_code_health/landscape.svg?style=flat
    :target: https://landscape.io/github/bebatut/enasearch/improve_code_health
    :alt: Code Health

ENASearch is a Python library for interacting with `ENA `_'s API.

Context
-------

The `European Nucleotide Archive (ENA) `_ is a database with a comprehensive record of nucleotide sequencing information (raw sequencing data, sequence assembly information and functional annotation). The data contained in ENA can be accessed manually or programmatically via `REST URLs `_. However, building HTTP-based REST requests is not always straightforward - a user friendly, high-level access is needed to make it easier to interact with ENA programmatically.

We developed ENASearch, a Python library to search and retrieve data from ENA database. It also allows for rich querying support by accessing different fields, filters or functions offered by ENA. ENASearch can be used as a Python package, through a command-line interface or inside Galaxy.


Usage
-----

ENASearch can be used via command-line:

.. code-block:: bash

    $ enasearch --help
    Usage: enasearch [OPTIONS] COMMAND [ARGS]...

      The Python library for interacting with ENA's API

    Options:
      --version   Show the version and exit.
      -h, --help  Show this message and exit.

    Commands:
      get_analysis_fields       Get the fields extractable for an analysis.
      get_display_options       Get the list of possible formats to display...
      get_download_options      Get the options for download of data from...
      get_filter_fields         Get the filter fields of a result to build a...
      get_filter_types          Return the filters usable for the different...
      get_results               Get the possible results (type of data).
      get_returnable_fields     Get the fields extractable for a result.
      get_run_fields            Get the fields extractable for a run.
      get_sortable_fields       Get the fields of a result that can sorted.
      get_taxonomy_results      Get list of taxonomy results.
      retrieve_analysis_report  Retrieve analysis report from ENA.
      retrieve_data             Retrieve ENA data (other than taxon).
      retrieve_run_report       Retrieve run report from ENA.
      retrieve_taxons           Retrieve data from the ENA Taxon Portal.
      search_data               Search data given a query.

    $ enasearch search_data --help
    Usage: enasearch search_data [OPTIONS]

      Search data given a query.

      This function

      - Extracts the number of possible results for the query - Extracts the all
      the results of the query (by potentially running several times the search
      function)

      The output can be redirected to a file and directly display to the
      standard output given the display chosen.

    Options:
      --free_text_search      Use free text search, otherwise the data warehouse
                              is used
      --query TEXT            Query string, made up of filtering conditions,
                              joined by logical ANDs, ORs and NOTs and bound by
                              double quotes; the filter fields for a query are
                              accessible with get_filter_fields and the type of
                              filters with get_filter_types  [required]
      --result TEXT           Id of a result (accessible with get_results)
                              [required]
      --display TEXT          Display option to specify the display format
                              (accessible with get_display_options)  [required]
      --download TEXT         Download option to specify that records are to be
                              saved in a file (used with file option, list
                              accessible with get_download_options)
      --file PATH             File to save the content of the search (used with
                              download option)
      --fields TEXT           Fields to return (accessible with
                              get_returnable_fields, used only for report as
                              display value) [multiple or comma-separated]
      --sortfields TEXT       Fields to sort the results (accessible with
                              get_sortable_fields, used only for report as display
                              value) [multiple or comma-separated]
      --offset INTEGER RANGE  First record to get (used only for display different
                              of fasta and fastq
      --length INTEGER RANGE  Number of records to retrieve (used only for display
                              different of fasta and fastq
      -h, --help              Show this message and exit.

It can also be used as a Python library:

.. code-block:: python

    >>> import enasearch
    >>> enasearch.retrieve_data(
            ids="A00145",
            display="fasta",
            download=None,
            file=None,
            offset=0,
            length=100000,
            subseq_range="3-63",
            expanded=None,
            header=None)
    [SeqRecord(seq=Seq('GAAGGAAGGTCTTCAGAGAACCTAGAGAGCAGGTTCACAGAGTCACCCACCTCA...GCC', SingleLetterAlphabet()), id='ENA|A00145|A00145.1', name='ENA|A00145|A00145.1', description='ENA|A00145|A00145.1 B.taurus BoIFN-alpha A mRNA : Location:3..63', dbxrefs=[])]

The information extracted from ENA can be in several formats: HTML, Text, XML, FASTA, FASTQ, ... XML outputs are transformed in a Python dictionary using xmltodict and the FASTA and FASTQ into SeqRecord objects using `BioPython `_.


Installation
------------

ENASearch can be installed with pip:

.. code-block:: bash

    $ pip install enasearch

or with conda:

.. code-block:: bash

    $ conda install -c bioconda enasearch

Tests
-----

ENASearch comes with tests:

.. code-block:: bash

    $ make test

These tests are automatically run on `TravisCI `_ for each Pull Request.

Documentation
-------------

Documentation about ENASearch is available online at `http://bebatut.fr/enasearch `_

To update it:

1. Make the changes in `src/docs`
2. Generate the doc with 

  .. code-block:: bash

    $ make doc

3. Check it by opening the `docs/index.html `_ file in a web browser
4. Propose the changes via a Pull Request

Generate the data descriptions
------------------------------

To run, ENASearch needs some data from ENA to describe how to query ENA. 
Currently, such information is manually extracted into CSV files in the `data` directory. Python objects are generated from these CSV files with

.. code-block:: bash

    $ python src/serialize_ena_data_descriptors.py

Owner

  • Name: Bérénice Batut
  • Login: bebatut
  • Kind: user
  • Location: Clermont-Ferrand, France
  • Company: University of Freiburg

@galaxyproject training, @usegalaxy-eu, @open-life-science, @StreetScienceCommunity, @gallantries

JOSS Publication

ENASearch: A Python library for interacting with ENA's API
Published
October 25, 2017
Volume 2, Issue 18, Page 418
Authors
Bérénice Batut ORCID
Bioinformatics group, Freiburg, Germany
Björn Grüning ORCID
Bioinformatics group, Freiburg, Germany
Editor
Roman Valls Guimera ORCID
Tags
European Nucleotide Archive API REST Galaxy

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 123
  • Total Committers: 1
  • Avg Commits per committer: 123.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Bérénice Batut b****t@g****m 123

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 24
  • Total pull requests: 26
  • Average time to close issues: 10 days
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 8
  • Total pull request authors: 4
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.73
  • Merged pull requests: 23
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tseemann (8)
  • bebatut (5)
  • vals (5)
  • bernt-matthias (2)
  • nigiord (1)
  • nh13 (1)
  • nick-youngblut (1)
  • jonm4024 (1)
Pull Request Authors
  • bebatut (23)
  • alexhbnr (1)
  • nh13 (1)
  • abhi18av (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 47 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 9
  • Total maintainers: 1
pypi.org: enasearch

A Python library for interacting with ENA's API

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 47 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 13.3%
Forks count: 14.2%
Average: 16.5%
Dependent repos count: 21.7%
Downloads: 23.1%
Maintainers (1)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • Click *
  • Sphinx *
  • biopython *
  • codecov *
  • dicttoxml *
  • flake8 *
  • pytest-cov *
  • requests *
  • sphinx_rtd_theme *
  • xmltodict *