recimap

Recimap (a reciprocal mapping tool) was developed as a bioinformatics command-line tool/pipeline to find rearrangements breakpoints between two closely related genomes.

https://github.com/casper-schutte/recimap

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-pipeline genetics genomics
Last synced: 4 months ago · JSON representation ·

Repository

Recimap (a reciprocal mapping tool) was developed as a bioinformatics command-line tool/pipeline to find rearrangements breakpoints between two closely related genomes.

Basic Info
  • Host: GitHub
  • Owner: casper-schutte
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 77.1 KB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics bioinformatics-pipeline genetics genomics
Created about 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Citation

README.md

ReciMap

ReciMap (a reciprocal mapping tool) was developed as a bioinformatics command-line tool/pipeline to find rearrangements breakpoints between two closely related genomes. It uses Burrows-Wheeler read mapping (with Bowtie2) to map synthetic reads from one genome onto the other, and vice versa (reciprocally). The reads are created from the genomes and partially mapping reads with high MAPQ scores are used to identify the borders of rearrangement events (breakpoints). The functionality/scope of the pipeline was extended to include the ability to identify synteny blocks between the two genomes.

ReciMap was created by Casper Schutte as part of an M.Sc. project at the University of Stellenbosch, South Africa.

Usage:

This pipeline only has 2 dependencies: samtools and Bowtie2. It is recommended that you use Conda or a similar environment manager. The dependencies can be installed by running: conda install samtools conda install bowtie2 This pipeline is best suited for use between very closely related genomes and was not designed to handle complex (overlapping) rearrangement events. Important Note: In its current state, the pipeline only works when ALL of the chromosome names in BOTH FASTA files are in the following format: ```

Chr1 (sequence for Chr1) Chr2 (sequence for Chr2) Chr3 (sequence for Chr3) etc ``` Install the required libraries and software tools (samtools and Bowtie2). The two genomes need to be in FASTA format and the files need to be in the same directory as all the scripts. Running the pipeline is as simple as running the following command in the terminal:

./recimap.sh

You will be prompted to select the first genome:

```

ReciMap

Select two genomes in FASTA (.fa) format and ReciMap will identify the borders of rearrangement events between them. ReciMap will also attempt to match up the synteny blocks between the genomes. This script takes no arguements, you will be promted to select the two genomes you wish to compare by selecting numbers from a list in the terminal. If you made a mistake during selection, please press 'Ctrl + c' to exit the program. The borders of rearrangement events will be written to files called genA.txt and genB.txt The final output showing the synteny blocks between the genomes will be written to text files in the form blocks(nameofFASTAfile).txt For more information, please see the repository for this project at:

https://github.com/casper-schutte/ReciMap

Please select the first genome (Genome A) 1) referencegenome.fa 2) rearrangedgenome.fa

?

``` The order in which the genomes are selected makes no difference. The borders will be in the form of synteny blocks. The rearrangement borders are where one block ends and another begins. This output will be written to a text file called blocks(nameofFASTAfile).txt. One of these files will be created for each of the two genomes (see an example of the format under the "Format Example" heading below):

Format Example:

The format is as follows n - (chromosome name, start position of block, end position of block) Where "n" is the numerical label of the block, representing the original order of the blocks in the other genome. For example, the contents of the file "blocksrearrangedgenome.txt": ``` Original order of blocks in genome reference_genome.fa 1 - ('Chr1', 1, 15400) 2 - ('Chr2', 1, 210) 3 - ('Chr2', 211, 14140) 4 - ('Chr3', 1, 700) 5 - ('Chr3', 701, 1401) 6 - ('Chr3', 1401, 5880) 7 - ('Chr3', 5882, 6790) 8 - ('Chr3', 6791, 8540) 9 - ('Chr4', 1, 487) 10 - ('Chr4', 493, 4480) 11 - ('Chr5', 1, 2240) 12 - ('Chr5', 2241, 3361) 13 - ('Chr5', 3361, 6292)

Order of blocks in genome rearranged_genome.fa n - ('Chr1', 1, 700) n - ('Chr1', 701, 1400) n - ('Chr1', 1401, 14700) n - ('Chr2', 1, 210) 3 - ('Chr2', 211, 14490) 5 - ('Chr3', 1, 700) 4 - ('Chr3', 701, 1401) 6 - ('Chr3', 1401, 5880) 8 - ('Chr3', 5882, 7630) 7 - ('Chr3', 7631, 8540) 9 - ('Chr4', 1, 490) 12 - ('Chr4', 493, 1608) 10 - ('Chr4', 1611, 5600) 11 - ('Chr5', 1, 2240) 13 - ('Chr5', 2241, 5172) ``` The numerical labels of the blocks represent the order of the blocks in the OTHER genome. For example, the block labelled "12" is located between blocks 9 and 10. This is due to a rearrangement event as in the reference genome it was the 12th block, located between blocks 11 and 13.

The other file ("blocksreferencegenome.txt") will be in the same format, but with the order of the blocks in the rearranged genome shown as the "correct" (numerical) order, and the order in which those synteny blocks appear in the reference genome.

The blocks labelled "n" are blocks whose synteny could not be established. This is often due to duplication events, which do not leave a clear border. Block ('Chr2', 1, 210) in this example could not be identified as being syntenous between the genomes, this is due to the method used to identify synteny blocks and the fact that the borders used to identify the presence of a border resulted from an inversion of this block (see the "Method" section in my thesis for more information on the inner workings of this pipeline).

For more information on the rearrangement border, an output file called "output.txt" will be left by the pipeline, giving more detail on each identified rearrangement border

Owner

  • Login: casper-schutte
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite as below."
authors:
    - family-names: Schutte
    - given-names: Casper
    - orcid: 0000-0003-4245-6842

title: "ReciMap"
version: 1.0
doi:
date_released:

GitHub Events

Total
Last Year