ribocode

release version

https://github.com/xryanglab/ribocode

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary

Keywords

bioinformatics orfs peptides ribosome-profiling
Last synced: 9 months ago · JSON representation

Repository

release version

Basic Info
  • Host: GitHub
  • Owner: xryanglab
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 301 KB
Statistics
  • Stars: 54
  • Watchers: 3
  • Forks: 16
  • Open Issues: 37
  • Releases: 0
Topics
bioinformatics orfs peptides ribosome-profiling
Created over 9 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Changelog License

README.rst

====================================================
Detect translated ORFs using ribosome-profiling data
====================================================

|BuildStatus| |PyPI| |PythonVersions| |BioConda| |Publish1| |Publish2| |downloads|

*RiboCode* is a very simple but high-quality computational algorithm to
identify genome-wide translated ORFs using ribosome-profiling data.

Dependencies:
-------------

- pysam

- pyfasta

- h5py

- Biopython

- Numpy

- Scipy

- statsmodels

- matplotlib

- HTSeq

- minepy

Installation
------------

*RiboCode* can be installed like any other Python packages. Here are some popular ways:

* Install via pypi:

.. code-block:: bash

  pip install ribocode

* Install via conda:

.. code-block:: bash

  conda install -c bioconda ribocode

* Install from source:

.. code-block:: bash

  git clone https://www.github.com/xzt41/RiboCode
  cd RiboCode
  python setup.py install

* Install from local:

.. code-block:: bash

  pip install RiboCode-*.tar.gz

If you have not administrator permission, you need to install *RiboCode* locally in you own directory by adding the
option ``--user`` in the above command. Then, you need to define ``~/.local/bin/`` in ``PATH`` variable,
and ``~/.local/lib/`` in ``PYTHONPATH`` variable. For example, if you are using the bash shell, you should add the following lines to your ``~/.bashrc`` file:

.. code-block:: bash

  export PATH=$PATH:$HOME/.local/bin/
  export PYTHONPATH=$HOME/.local/lib/python2.7

then, source your ``~/.bashrc`` file using this command:

.. code-block:: bash

  source ~/.bashrc

Users can also update or uninstall package through one of the following commands:

.. code-block:: bash

  pip install --upgrade RiboCode # upgrade
  pip uninstall RiboCode # uninstall
  conda update -c bioconda ribocode # upgrade
  conda remove ribocode # uninstall

Tutorial to analyze ribosome-profiling data and run *RiboCode*
--------------------------------------------------------------

Here, we use the `HEK293 dataset`_ as an example to illustrate the use of *RiboCode* and demonstrate typical workflow.
Please make sure the path and file name are correct.

1. **Required files** 

   The genome FASTA file, GTF file for annotation can be downloaded from:


   http://www.gencodegenes.org

   or from:

   http://asia.ensembl.org/info/data/ftp/index.html

   http://useast.ensembl.org/info/data/ftp/index.html


   For example, the required files in this tutorial can be downloaded from following URL:

   GTF: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz

   FASTA: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz

   |Important| The GTF file required by *RiboCode* should include three-level hierarchy
   annotations: genes,transcripts and exons. Some GTF files may lack the gene and transcript
   annotations, users can added these annotations using the "GTFupdate" command in *RiboCode*.
   Please refer to `GTF_update.rst`_ for more information.

   The raw Ribo-seq FASTQ file can be downloaded using fastq-dump tool from `SRA_Toolkit`_:

   .. code-block:: bash

      fastq-dump -A 

2. **Trimming adapter sequence for ribo-seq data**

   Using cutadapt program https://cutadapt.readthedocs.io/en/stable/installation.html

   Example:

   .. code-block:: bash

      cutadapt -m 20 --match-read-wildcards -a (Adapter sequence) -o  


   Here, the adapter sequences for this data had already been trimmed off, so we can skip this step.

3. **Removing ribosomal RNA(rRNA) derived reads**

   Removing rRNA contamination by aligning the trimmed reads to rRNA sequences using `Bowtie`_,
   then keeping the unaligned reads for the next step.

   rRNA sequences are provided in `rRNA.fa`_ file.

   Example:

   .. code-block:: bash

      bowtie-build  rRNA
      bowtie -p 8 -norc --un  -q  rRNA 

4. **Aligning the clean reads to reference genome**

   Using STAR program: https://github.com/alexdobin/STAR

   Example:

   (1). Build index

   .. code-block:: bash

      STAR --runThreadN 8 --runMode genomeGenerate --genomeDir 
      --genomeFastaFiles  --sjdbGTFfile 

   .. _STAR:

   (2). Alignment:

   .. code-block:: bash

      STAR --outFilterType BySJout --runThreadN 8 --outFilterMismatchNmax 2 --genomeDir 
      --readFilesIn   --outFileNamePrefix  --outSAMtype BAM
      SortedByCoordinate --quantMode TranscriptomeSAM GeneCounts --outFilterMultimapNmax 1
      --outFilterMatchNmin 16 --alignEndsType EndToEnd

5. **Running RiboCode to identify translated ORFs**

   (1). Preparing the transcripts annotation files:

   .. code-block:: bash

      prepare_transcripts -g  -f  -o 

   |Important| The RiboCode_annot folder is necessary for the following steps, so its location should be properly given if author moved it or changed the working directory.

   (2). Selecting the length range of the RPF reads and identify the P-site locations:

   .. code-block:: bash

      metaplots -a  -r 


   This step will generate two files: a PDF file plots the aggregate profiles of the distance from the 5'-end
   of reads to the annotated start codons (or stop codons), which is used for examining the P-site periodicity of RPF reads on CDS regions. The P-site config file, which defines the read lengths with
   strong 3-nt periodicity and the associated P-site locations for each length.  In some cases, user may have multiple bam files to predict ORFs
   together in next step, they can use "-i" argument to specify a text file which contains the names of these bam files (
   one file per line)

   .. _RiboCode:

   (3). Detecting translated ORFs using the ribosome-profiling data:

   .. code-block:: bash

      RiboCode -a  -c  -l no -g -o 


   Using the config file generated by last step to specify the information of the bam file and P-site parameters,
   please refer to the example file `config.txt`_ in data folder. The "gtf" or "bed" format file of predicted ORFs can
   be obtained by adding the "-g" or "-b" argument to this command.

   **Explanation of final result files**

   The *RiboCode* generates two text files:
   The "(output file name).txt" contains the information of all predicted ORFs in each transcript.
   The "(output file name)_collapsed.txt" file combines the ORFs having the same stop codon in different transcript
   isoforms: the one harboring the most upstream in-frame ATG will be kept.

   Some column names of the result file::

    - ORF_ID: The identifier of predicated ORF.
    - ORF_type: The type of predicted ORF, which is annotated according to its location to associated CDS. The following ORF categories are reported:

     "annotated" (overlapping with annotated CDS, have the same stop codon with annotated CDS)

     "uORF" (upstream of annotated CDS, not overlapping with annotated CDS)

     "dORF" (downstream of annotated CDS, not overlapping with annotated CDS)

     "Overlap_uORF" (upstream of annotated CDS and overlapping annotated with CDS)

     "Overlap_dORF" (downstream of annotated CDS and overlapping annotated CDS"

     "Internal" (internal ORF of annotated CDS, but in a different reading frame)

     "novel" (from non-coding genes or non-coding transcripts of the coding genes).

    - alt_ORF_type: only shown in "_collapsed.txt" file for reporting alternative annotations of each ORF based on its relative location in those transcripts other than the longest one       
    - ORF_tstart, ORF_tstop: the start and end position of ORF relative to its transcript (1-based coordinate)
    - ORF_gstart, ORF_gstop: the start and end position of ORF in the genome (1-based coordinate)
    - pval_frame0_vs_frame1: significance levels of P-site densities of frame0 greater than of frame1
    - pval_frame0_vs_frame2: significance levels of P-site densities of frame0 greater than of frame2
    - pval_combined: integrated P-value by combining pval_frame0_vs_frame1 and pval_frame0_vs_frame2
    - adjusted_pval: adjusted p-value for multiple testing correction.

   **All above three steps can also be easily run by a single command "RiboCode_onestep":**

   .. code-block:: bash

      RiboCode_onestep -g  -f  -r 
                       -l no -o 

   (4). (optional) Plotting the P-sites densities of predicted ORFs

   Using the "plot_orf_density" command, for example:

   .. code-block:: bash

      plot_orf_density -a  -c  -t (transcript_id)
      -s (ORF_gstart) -e (ORF_gstop)

   The generated PDF plots can be edited by Adobe Illustrator.

   (5). (optional) Counting the number of RPF reads aligned to ORFs

   The number of reads aligned on each ORF can be counted by the "ORFcount" command which will call the HTSeq-count program.
   Only the reads of a given length will be counted. For those ORF with length longer than a specified value (set by "-e"),
   the RPF reads located in first few and last few codons can be excluded by adjusting the parameters "-f" and "-l".
   For example, the reads with length between 26-34 nt aligned on predicted ORF can be obtained by using below command:

   .. code-block:: bash

      ORFcount -g  -r  -f 15 -l 5 -e 100 -m 26 -M 34 -o 

   The reads aligned to first 15 codons and last 5 codons of ORFs and had the length longer than 100 nt will be excluded.
   The "RiboCode_ORFs_result.gtf" file can be generated by `RiboCode`_ command. The "ribo-seq genomic mapping file" is the
   genome-wide mapping file produced by `STAR`_ mapping.


Recipes (FAQ):
--------------
1. **I have a BAM/SAM file aligned to genome, how do I convert it to transcriptome-based mapping file ?**

   You can use STAR aligner to generate the transcriptome-based alignment file by specifying the "--quantMode TranscriptomeSAM" parameters,
   or use the "sam-xlate" command from `UNC Bioinformatics Utilities`_ .

2. **How to use multiple BAM/SAM files to identify ORFs?**

   You can select the read lengths which show strong 3-nt periodicity and the corresponding P-site locations for each
   BAM/SAM file, then list each file and their information in `config.txt`_ file. *RiboCode* will combine the P-site
   densities at each nucleotides of these BAM/SAM files together to predict ORFs.

3. **Generating figures with matplotlib when DISPLAY variable is undefined or invalid**

   When running the "metaplots" or "plot_orf_density" command,  some users received errors similar to the following:

      ``raise RuntimeError('Invalid DISPLAY variable')``

      ``_tkinter.TclError: no display name and no $DISPLAY environment variable``

   The main problem is that default backend of matplotlib is unavailable. The solution is to modify the backend in matplotlibrc file.
   A very simple solution is to set the MPLBACKEND environment variable, either for your current shell or for a single script:

   .. code-block:: bash

      export MPLBACKEND="module:Agg"

   Giving below are non-interactive backends, capable of writing to a file:

      Agg  PS  PDF  SVG  Cairo  GDK

   See also:

   http://matplotlib.org/faq/usage_faq.html#what-is-a-backend

   http://matplotlib.org/users/customizing.html#the-matplotlibrc-file

   http://stackoverflow.com/questions/2801882/generating-a-png-with-matplotlib-when-display-is-undefined


For any questions, please contact:
----------------------------------
Xuerui Yang (yangxuerui[at]tsinghua.edu.cn); Zhengtao Xiao (zhengtao.xiao[at]xjtu.edu.cn)

.. _SRA_Toolkit: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software

.. _HEK293 dataset: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR1630831

.. _config.txt: https://github.com/xryanglab/RiboCode/blob/master/data/config.txt

.. _rRNA.fa: https://github.com/xryanglab/RiboCode/blob/master/data/rRNA.fa

.. _GTF_update.rst: https://github.com/xryanglab/RiboCode/blob/master/data/GTF_update.rst

.. _UNC Bioinformatics Utilities: https://github.com/mozack/ubu

.. _Bowtie: http://bowtie-bio.sourceforge.net/index.shtml

.. |PyPI| image:: https://img.shields.io/pypi/v/RiboCode.svg?style=flat-square
   :target: https://pypi.python.org/pypi/RiboCode

.. |PythonVersions| image:: https://img.shields.io/pypi/pyversions/RiboCode.svg?style=flat-square
   :target: https://pypi.python.org/pypi/RiboCode

.. |BioConda| image:: https://img.shields.io/badge/install-bioconda-blue.svg?style=flat-square
   :target: http://bioconda.github.io/recipes/ribocode/README.html
   
.. |Anaconda| image:: https://anaconda.org/bioconda/ribocode/badges/version.svg
   :target: https://anaconda.org/bioconda/ribocode

.. |downloads| image:: https://anaconda.org/bioconda/ribocode/badges/downloads.svg
   :target: https://anaconda.org/bioconda/ribocode

.. |Publish1| image:: https://img.shields.io/badge/publish-NAR-blue.svg?style=flat-square
   :target: https://doi.org/10.1093/nar/gky179

.. |Publish2| image:: https://img.shields.io/badge/publish-JOVE-brightgreen.svg?style=flat-square
   :target: https://dx.doi.org/10.3791/63366   

.. |BuildStatus| image:: https://circleci.com/gh/xryanglab/RiboCode.svg?style=svg
    :target: https://circleci.com/gh/xryanglab/RiboCode

.. |Important| image:: https://img.shields.io/badge/-Note-orange.svg
    :width: 50
    :target: https://github.com/xryanglab/RiboCode/blob/master/data/GTF_update.rst

Owner

  • Name: xryanglab
  • Login: xryanglab
  • Kind: organization

GitHub Events

Total
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 7
  • Fork event: 1
Last Year
  • Issues event: 3
  • Watch event: 7
  • Issue comment event: 7
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 147
  • Total Committers: 4
  • Avg Commits per committer: 36.75
  • Development Distribution Score (DDS): 0.163
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
xzt41 x****1@1****m 123
Zhengtao xiao 6****o 17
Sherking s****e@g****m 6
YangLabProject y****i@t****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 49
  • Total pull requests: 18
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 36
  • Total pull request authors: 2
  • Average comments per issue: 1.31
  • Average comments per pull request: 0.11
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • z626093820 (6)
  • Roleren (3)
  • zhengtaoxiao (2)
  • DooYal (2)
  • yx-xu (2)
  • ewallace (2)
  • PSSUN (2)
  • trum994 (2)
  • koosle (1)
  • marcasriv (1)
  • fulaibaowang (1)
  • hugch2020 (1)
  • mosi223 (1)
  • dougbarrows (1)
  • L1angyan (1)
Pull Request Authors
  • zhengtaoxiao (16)
  • sherkinglee (2)
Top Labels
Issue Labels
help wanted (4)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 115 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 20
  • Total maintainers: 1
pypi.org: ribocode

A package for identifying the translated ORFs using ribosome-profiling data

  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 115 Last month
Rankings
Forks count: 9.6%
Stargazers count: 9.9%
Dependent packages count: 10.1%
Average: 13.3%
Downloads: 15.4%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 10 months ago

Dependencies

setup.py pypi
  • biopython *
  • future *
  • h5py >3.0.0
  • htseq *
  • matplotlib *
  • minepy *
  • numpy *
  • pyfasta *
  • pysam >0.8.4
  • scipy *
  • statsmodels *