raccoonmask

Nextflow pipeline wrapping Repeat Modeler & Masker, TE Trimmer, and MCHelper

https://github.com/emilytrybulec/raccoonmask

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Nextflow pipeline wrapping Repeat Modeler & Masker, TE Trimmer, and MCHelper

Basic Info
  • Host: GitHub
  • Owner: emilytrybulec
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 39.6 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Nextflow run with docker run with singularity

Introduction

emilytrybulec/raccoonMask is a bioinformatics pipeline that takes a finished genome and performs repeat analysis. It produces a masked genome (.fasta), files containing coordinates of regions identified as repeats (.bed) for further manual curation, and images depicting output from TE Trimmer (.pdf).

  1. Repeat Modeler BuildDatabase
  2. Repeat Modeler
  3. TE Trimmer
  4. Repeat Masker

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

First, go through nextflow.config to configure the pipeline to your needs. Each option can be modified to change which programs run and command line options.

nextflow.config:

```config params {

// Input options
te_trimmer                 = false
repeat_masker              = true
cons_thr                   = 0.5

soft_mask                  = true

species                    = null
genome_fasta               = null
consensus_fasta            = null

} ```

Next, create a params.yaml file to input information in place of the null configurations. This file will, at a minimum, contain your genome and preferred out directory name. Optionally, a consensus path and RepeatMasker species flag can be supplied.

params.yaml:

yaml params { genome_fasta : "/core/projects/colossalanalyses/Finished_Genomes_for_Annotation/BayDuikerCDO11_5Jan2023_RaconR3.fasta" outdir : "bay_duiker_softmask" species : "cow" }

Now, you can run the pipeline using:

bash nextflow pull emilytrybulec/raccoonMask nextflow run emilytrybulec/raccoonMask \ -profile <docker/singularity/.../institute> \ -params-file params.yaml

Xanadu users: please refer to the example script.

Running TEtrimmer:

TEtrimmer is currently being run through a clone of the git located in the assets folder. Users who would like to run TEtrimmer must create the conda environment before running, in accordance with the TEtrimmer usage directions. bash conda create --name TEtrimmer conda install -c conda-forge mamba mamba install bioconda::tetrimmer conda activate TEtrimmer TEtrimmer --help The pipeline will automatically activate the conda environment when running TEtrimmer.

Credits

emilytrybulec/raccoonMask was originally written by Emily Trybulec with the help of Jessica Storer.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use emilytrybulec/raccoonMask for your analysis, please cite it using this git.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Login: emilytrybulec
  • Kind: user

Citation (CITATIONS.md)

# emilytrybulec/repeat_curation: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

GitHub Events

Total
  • Watch event: 2
  • Push event: 203
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 203
  • Create event: 1

Dependencies

modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/repeatmodeler/builddatabase/meta.yml cpan
modules/nf-core/repeatmodeler/repeatmodeler/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan
modules/nf-core/fastqc/environment.yml conda
  • fastqc 0.12.1.*
modules/nf-core/repeatmodeler/builddatabase/environment.yml conda
  • repeatmodeler 2.0.5.*
modules/nf-core/repeatmodeler/repeatmodeler/environment.yml conda
  • repeatmodeler 2.0.5.*