mosaiqc-public
SmartCleaning (now called MosaiQC) is a script developed by Thiago Peixoto Leal to clean and perform QC of genotyping array data using PLINK. It automates the different steps. If interested in using SmartCleaning, contact the LDGH/Mosaico Translational Genomics team (carolina160195@hotmail.com )
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
SmartCleaning (now called MosaiQC) is a script developed by Thiago Peixoto Leal to clean and perform QC of genotyping array data using PLINK. It automates the different steps. If interested in using SmartCleaning, contact the LDGH/Mosaico Translational Genomics team (carolina160195@hotmail.com )
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MosaiQC
MosaiQC automates the complete LDGH cleaning pipeline, including the steps: - Remove chr 0 - Remove duplicate data - Remove missing data - Infer individual sex - Remove A|T and C|G variants - Remove 100% heterozygotes variants - Anotate the variants for DBSNP ID - LiftOver
Users: Mateus Gouveia (Rotimi's group, NHGRI/NIH), Esteban Parra group (University of Toronto), Marilia Scliar (Mayana Zatz group, University of Sao Paulo), Ignacio Mata Lab at the Genomic Medicine Institute (Cleveland Clinics, OH, US)
Requirements
Smart Cleaning script was implemented using python language. But the following programs are required: - Plink
Files Required
database
File with the description of the databases that will be cleaning. This file contains five columns separated by \t. The columns are: - Population name (posteriorly will be used in the final file name) - plink bfile input location - first chromosome in the file - last chromosome in the file - input file genome version
Example: ```
POP Grupo Caminho ChrInicial ChrFinal build
M1 /home/thiago/TestesABC/SUDANextraido 1 1 37
```
reference
File with the path of reference files. This file contains two columns separated by \t. The columns are: - Name of reference dataset - Path to the reference file Example: ``` DBSNP /media/thiago/Data/Unificado/Ref/dbSNPHG38HG37lastversionchr*corrected.txt.gz KGP /media/thiago/Data/Unificado/Ref/HumanOmni5Exome-4-v1-1-B-auxilliary-file.txt
```
programs
File with the path to the programs used. This file contains two columns separated by \t. The columns are: - Name of program - Path to the program Example:
``` plink /home/thiago/Programs/plink
```
Parameters
Mandatory Parameters
``` --programs or -p File with the path to the programs used --reference or -r File with the path of reference files --database or -d File with the description of the databases --folder or -f Folder to store the intermediary files and output files
```
Optional Parameters
--change or -c Change the genome version for that choosed by the user
--chromosomeX or -x Set the program make sexual chromosome steps. If you select, you have to put the genome version -see: https://www.cog-genomics.org/plink/1.9/data#split_x
--steps or -s Steps to be performed ant their respective parameters
--lastautossomal or -l Number of the last autosomal chromosome (default 22)
Execution example
``` python3 main.py -d database.txt -r reference.txt -p programas.txt -c 37 -f /media/DadosCongelados/DadosLimpos/
```
Owner
- Name: Lab. de Diversidade Genética Humana (LDGH)
- Login: ldgh
- Kind: organization
- Location: Institute of Biological Sciences - UFMG (Brasil)
- Website: http://ldgh.com.br/
- Repositories: 5
- Profile: https://github.com/ldgh
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Leal
given-names: Thiago Peixoto
orcid: https://orcid.org/0000-0002-5829-6452
title: "MosaiQC"
version:
date-released: