raccoonmask
Nextflow pipeline wrapping Repeat Modeler & Masker, TE Trimmer, and MCHelper
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Nextflow pipeline wrapping Repeat Modeler & Masker, TE Trimmer, and MCHelper
Basic Info
- Host: GitHub
- Owner: emilytrybulec
- License: mit
- Language: Python
- Default Branch: main
- Size: 39.6 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Introduction
emilytrybulec/raccoonMask is a bioinformatics pipeline that takes a finished genome and performs repeat analysis. It produces a masked genome (.fasta), files containing coordinates of regions identified as repeats (.bed) for further manual curation, and images depicting output from TE Trimmer (.pdf).
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.
First, go through nextflow.config to configure the pipeline to your needs. Each option can be modified to change which programs run and command line options.
nextflow.config:
```config params {
// Input options
te_trimmer = false
repeat_masker = true
cons_thr = 0.5
soft_mask = true
species = null
genome_fasta = null
consensus_fasta = null
} ```
Next, create a params.yaml file to input information in place of the null configurations. This file will, at a minimum, contain your genome and preferred out directory name. Optionally, a consensus path and RepeatMasker species flag can be supplied.
params.yaml:
yaml
params {
genome_fasta : "/core/projects/colossalanalyses/Finished_Genomes_for_Annotation/BayDuikerCDO11_5Jan2023_RaconR3.fasta"
outdir : "bay_duiker_softmask"
species : "cow"
}
Now, you can run the pipeline using:
bash
nextflow pull emilytrybulec/raccoonMask
nextflow run emilytrybulec/raccoonMask \
-profile <docker/singularity/.../institute> \
-params-file params.yaml
Xanadu users: please refer to the example script.
Running TEtrimmer:
TEtrimmer is currently being run through a clone of the git located in the assets folder. Users who would like to run TEtrimmer must create the conda environment before running, in accordance with the TEtrimmer usage directions.
bash
conda create --name TEtrimmer
conda install -c conda-forge mamba
mamba install bioconda::tetrimmer
conda activate TEtrimmer
TEtrimmer --help
The pipeline will automatically activate the conda environment when running TEtrimmer.
Credits
emilytrybulec/raccoonMask was originally written by Emily Trybulec with the help of Jessica Storer.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
If you use emilytrybulec/raccoonMask for your analysis, please cite it using this git.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Login: emilytrybulec
- Kind: user
- Repositories: 1
- Profile: https://github.com/emilytrybulec
Citation (CITATIONS.md)
# emilytrybulec/repeat_curation: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
GitHub Events
Total
- Watch event: 2
- Push event: 203
- Create event: 1
Last Year
- Watch event: 2
- Push event: 203
- Create event: 1
Dependencies
- fastqc 0.12.1.*
- repeatmodeler 2.0.5.*
- repeatmodeler 2.0.5.*