mosaiqc-public

SmartCleaning (now called MosaiQC) is a script developed by Thiago Peixoto Leal to clean and perform QC of genotyping array data using PLINK. It automates the different steps. If interested in using SmartCleaning, contact the LDGH/Mosaico Translational Genomics team (carolina160195@hotmail.com )

https://github.com/ldgh/mosaiqc-public

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

SmartCleaning (now called MosaiQC) is a script developed by Thiago Peixoto Leal to clean and perform QC of genotyping array data using PLINK. It automates the different steps. If interested in using SmartCleaning, contact the LDGH/Mosaico Translational Genomics team (carolina160195@hotmail.com )

Basic Info
  • Host: GitHub
  • Owner: ldgh
  • Default Branch: main
  • Homepage:
  • Size: 23.4 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

MosaiQC

MosaiQC automates the complete LDGH cleaning pipeline, including the steps: - Remove chr 0 - Remove duplicate data - Remove missing data - Infer individual sex - Remove A|T and C|G variants - Remove 100% heterozygotes variants - Anotate the variants for DBSNP ID - LiftOver

Users: Mateus Gouveia (Rotimi's group, NHGRI/NIH), Esteban Parra group (University of Toronto), Marilia Scliar (Mayana Zatz group, University of Sao Paulo), Ignacio Mata Lab at the Genomic Medicine Institute (Cleveland Clinics, OH, US)

Requirements

Smart Cleaning script was implemented using python language. But the following programs are required: - Plink

Files Required

database

File with the description of the databases that will be cleaning. This file contains five columns separated by \t. The columns are: - Population name (posteriorly will be used in the final file name) - plink bfile input location - first chromosome in the file - last chromosome in the file - input file genome version

Example: ```

POP Grupo Caminho ChrInicial ChrFinal build

M1 /home/thiago/TestesABC/SUDANextraido 1 1 37

```

reference

File with the path of reference files. This file contains two columns separated by \t. The columns are: - Name of reference dataset - Path to the reference file Example: ``` DBSNP /media/thiago/Data/Unificado/Ref/dbSNPHG38HG37lastversionchr*corrected.txt.gz KGP /media/thiago/Data/Unificado/Ref/HumanOmni5Exome-4-v1-1-B-auxilliary-file.txt

```

programs

File with the path to the programs used. This file contains two columns separated by \t. The columns are: - Name of program - Path to the program Example:

``` plink /home/thiago/Programs/plink

```

Parameters

Mandatory Parameters

``` --programs or -p File with the path to the programs used --reference or -r File with the path of reference files --database or -d File with the description of the databases --folder or -f Folder to store the intermediary files and output files

```

Optional Parameters

--change or -c Change the genome version for that choosed by the user --chromosomeX or -x Set the program make sexual chromosome steps. If you select, you have to put the genome version -see: https://www.cog-genomics.org/plink/1.9/data#split_x --steps or -s Steps to be performed ant their respective parameters --lastautossomal or -l Number of the last autosomal chromosome (default 22)

Execution example

``` python3 main.py -d database.txt -r reference.txt -p programas.txt -c 37 -f /media/DadosCongelados/DadosLimpos/

```

Owner

  • Name: Lab. de Diversidade Genética Humana (LDGH)
  • Login: ldgh
  • Kind: organization
  • Location: Institute of Biological Sciences - UFMG (Brasil)

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Leal
    given-names: Thiago Peixoto
    orcid: https://orcid.org/0000-0002-5829-6452
title: "MosaiQC"
version: 
date-released:

GitHub Events

Total
Last Year