mosquito_pipeline
Pipeline for analysis of Anopheles mosquitoes
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary
Repository
Pipeline for analysis of Anopheles mosquitoes
Basic Info
- Host: GitHub
- Owner: sophiemoss
- Language: Jupyter Notebook
- Default Branch: main
- Size: 4.04 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
mosquito_pipeline
Pipeline for whole genome sequence analysis of Anopheles mosquitoes.
Step 1: Acquire fastq reads of samples and use fastq2matrix to generate bam files and vcf files for each sample.
Step 2: Conduct basic statistics on samples to check quality.
Step 3: Make a genomics database vcf using the samples that you deem good enough quality to keep.
Step 4: Filter the genomics database vcf to retain only good quality SNPs.
Step 5: Conducting Principal Components Analysis with your filtered vcf
Step 6: Create PCA using this R script
Step 7: Calculating and plotting admixture
Step 8: Creating a maximum likelihood tree
Step 9: Calculating FST (using python and jupyter notebook)
Step 10: Calculating genetic diversity metrics, nucleotide diversity and tajimas D
Step 11: Selection, calculating Garud's H12 statistic (using python and jupyter notebook)
Step 12: Selection, calculating iHS (using python and jupyter notebook)
Step 13: Selection, calculating XPEHH (using python and jupyter notebook)
Step 14: Using DELLY to analyse structural variants
Other scripts:
calculate_n50.py can be used to calculate the n50 of a sequence, for example a reference genome.
check_sex.py can be used to check the sex of mosquito samples using the ratio of coverage between the X chromosome and an autosome.
chromocoverage.py and createcoverageplotssambambageneric.py are part of basicstatistics and can be used to assess the coverage of sequence data across the genome, including visualising this in plots.
generateadmixbarplot_colours.R is an R script used as part of 7.admixture.sh
Owner
- Login: sophiemoss
- Kind: user
- Repositories: 1
- Profile: https://github.com/sophiemoss
Citation (citation.cff)
cff-version: 1.0.0 message: "If you use this pipeline, please cite it as below." authors: - family-names: "MOSS" given-names: "S" orcid: "https://orcid.org/0000-0003-2843-9085" title: "mosquito_pipeline" version: 1.0.0 date-released: 2025-03-14 url: "https://github.com/sophiemoss/mosquito_pipeline"
GitHub Events
Total
- Push event: 3
- Public event: 1
- Pull request event: 2
Last Year
- Push event: 3
- Public event: 1
- Pull request event: 2