https://github.com/bionf/ncortho

Targeted ortholog search for miRNAs

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Targeted ortholog search for miRNAs

Basic Info

Host: GitHub
Owner: BIONF
License: gpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 7.69 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Fork of felixlangschied/ncortho

Created over 4 years ago · Last pushed almost 3 years ago

https://github.com/BIONF/ncortho/blob/master/

# ncOrtho
[![PyPI version](https://badge.fury.io/py/ncOrtho.png)](https://pypi.org/project/ncOrtho/)
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

NcOrtho is a tool for the targeted search of orthologous micro RNAs (miRNAs) throughout the tree of life. 
Conceptually, it works similar to the program [fDOG](https://github.com/BIONF/fDOG) in that a probabilistic model of 
a reference miRNA is created. For training the model, orthologs of the reference sequence are first identified in 
a set of taxa that are more closely related to the reference species. In contrast to fDOG, ncOrtho does not train hidden 
Markov Models but covariance models (CMs) (Eddy & Durbin, 1994) 
which also model conservation of the miRNA's secondary structure.

![workflow](https://github.com/BIONF/ncortho/blob/master/ncOrtho/docs/figure1_ncortho_worklfow.png)

## How to cite

Langschied F, Leisegang MS, Brandes RP, Ebersberger I. 
[ncOrtho: efficient and reliable identification of miRNA orthologs](https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkad467/7187706). 
Nucleic Acids Res. 2023 Jun 1:gkad467. 
doi: 10.1093/nar/gkad467. 
Epub ahead of print. PMID: 37260093.

## Getting Started
NcOrtho depends on multiple third party applications, some of which are Linux specific.
All dependencies can be installed with [Anaconda](https://www.anaconda.com/).
It is recommended to create a new Anaconda environment for this. For example:
```
conda create --name ncOrtho python=3.8
conda activate ncOrtho
```

### Prerequisites
* **Operating System:** Linux (tested on: Ubuntu 20.04)
* **Python:** version 3 or higher (tested with v3.8)

Tool | Tested version | Anaconda installation
------------ | ------------- | -------------
BLASTn | v2.7.1 | `conda install -c kantorlab blastn`
Infernal | v1.1.4 | `conda install -c bioconda infernal`
t_coffee | v13.45 | `conda install -c bioconda t-coffee`
MUSCLE | v5.1 | `conda install -c bioconda muscle`

### Installing

After installing all three dependencies, ncOrtho can be installed with `pip`:
```
 pip install ncOrtho
```

## Usage
### CM construction
As a targeted search for orthologs, ncOrtho's biggest strength is its flexibility to change the taxonomic 
scope of an analysis according to the research question at hand.

For this reason, a few questions need to be answered, before we can start constructing covariance models:
1. What is the reference species?
2. How phylogenetically diverse will my set of target species be?
3. From which species should the core set of miRNA orthologs be extracted, which will be used for training the CMs?
4. Which miRNAs are going to be used for the ortholog search?

To identify suitable core species for a given reference species you can calculate an estimate of conserved synteny 
given a set of pairwise ortholog predictions with:
```
ncCheck -p  -o 
```
You can find additional information about ncCheck in the
[WIKI](https://github.com/BIONF/ncortho/wiki/Choosing-core-species).


As soon as you know what your core species are going to be, you will need to collect the following data:

* Genomic sequence in FASTA format (e.g "genomic.fna" from RefSeq)
* Genome annotation in GFF3 format (e.g. "genomic.gff" from RefSeq)
* Pairwise orthologs of all proteins between the reference and each core species (more information 
[here](https://github.com/BIONF/ncortho/wiki/Input-Data#pairwise-orthologs)

Modify the [example parameters](ncOrtho/coreset/example_parameters.yaml) file to contain all 
relevant paths to your input files. The "name" property of your reference and core species has to merely be a 
unique identifier. It is however recommended to use whitespace-free species names to increase readability.

Additional to the parameters file, you will need a tab separated file containing the position and sequence of each 
miRNA for which a model should be constructed (more information 
[here](https://github.com/BIONF/ncortho/wiki/Input-Data#reference-mirnas)). 

You can then start CM construction with:
```
ncCreate -p  -n  -o 
```
If you encounter errors, make sure that:
* The identifiers in the pairwise orthologs files match the ones in the gff files (use the `-idtype=` flag to use other
ID types)
* The contig/chromosome column in tab separated miRNA input file match the contig/chromosome id 
in the reference gff file

Use `ncCreate -h` to see all available options for CM construction.

### Orthology search

You can start the orthology search with:
```
ncSearch -m  -n  -q  -r  -o 
```

Use `ncSearch -h` to see all available options for the orthology search or have a look at the 
[WIKI](https://github.com/BIONF/ncortho/wiki/Running-the-orthology-search).

### Phylogenetic Analysis

To facilitate the downstream analyses of miRNA orthologs, we also supply the `ncAnalyze` function:
```
ncAnalzye -r  -o  -m 
```
This will create a phylogenetic Profile ready for visualisation in [PhyloProfile](https://github.com/BIONF/PhyloProfile)
as well as calculate a supermatrix species tree based on the miRNA orthologs.

More information can be found with `ncAnalyze -h` or the [WIKI](https://github.com/BIONF/ncortho/wiki/Analysis)

## Support

Please refer to our Wiki Page of [known issues](https://github.com/BIONF/ncortho/wiki/Known-Issues) first, 
then consider opening an issue on GitHub or contacting me directly via [mail](langschied@bio.uni-frankfurt.de)

## Contributors

* [Felix Langschied](https://github.com/felixlangschied)
* [Andreas Blaumeiser](https://github.com/acblaumeiser)
* Mirko Brggemann
* Daniel Amsel

Dept. for Applied Bioinformatics Institute for Cell Biology and Neurosciences, Goethe University, Frankfurt am Main

* [Ingo Ebersberger](https://www.bio.uni-frankfurt.de/43045195/Abt__Ebersberger___Biowissenschaften)
 
## License

This project is licensed under the GNU General Public License v3.0 - see the [LICENSE.md](LICENSE.md) file for details

## Acknowledgments

* [Lorenz et al. 2011](https://almob.biomedcentral.com/articles/10.1186/1748-7188-6-26): ViennaRNA Package 2.0
* [Nawrocki et al. 2013](https://academic.oup.com/bioinformatics/article/29/22/2933/316439): Infernal 1.1: 
100-fold faster RNA homology searches
* [Notredame et al. 2000](http://www.tcoffee.org/Publications/Pdf/tcoffee.pdf): T-Coffee: A Novel Method for Fast and 
AccurateMultiple Sequence Alignment
* [Shirley et al. 2015](https://peerj.com/preprints/970v1/): Efficient "pythonic" access to FASTA files using pyfaidx

## Contact

For support or bug reports please contact: [langschied@bio.uni-frankfurt.de](langschied@bio.uni-frankfurt.de)

Owner

Name: BIONF
Login: BIONF
Kind: organization

Repositories: 12
Profile: https://github.com/BIONF

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bionf/ncortho

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/BIONF/ncortho/blob/master/

Owner

GitHub Events

Total

Last Year