https://github.com/ahmedmoustafa/genetic-ancestry

Genetic Ancestry

https://github.com/ahmedmoustafa/genetic-ancestry

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.0%) to scientific vocabulary

Keywords

ancestry diversity ethnicity-analysis genomics ggplot pca plink population-genetics population-structure principal-component-analysis single-nucleotide-polymorphisms
Last synced: 5 months ago · JSON representation

Repository

Genetic Ancestry

Basic Info
  • Host: GitHub
  • Owner: ahmedmoustafa
  • License: cc0-1.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 13.9 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
ancestry diversity ethnicity-analysis genomics ggplot pca plink population-genetics population-structure principal-component-analysis single-nucleotide-polymorphisms
Created over 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.html














README












































Genetic Ancestry

Ethnic Groups Image source: Council of State Archivists

A detailed workflow to compute and visualize the principal component analysis (PCA) of genotypic Single Nucleotide Polymorphisms (SNPs). The workflow leverages a curated set of 10,000 SNPs predefined by GRAF to pinpoint ancestry markers. For the computation of PCA, we employ PLINK for generating the eigenvectors and eigenvalues.

Workflow Overview

  1. Fingerprinting SNPs Extraction: Extract GRAF’s 10,000 curated SNPs from the dbSNP database.
  2. Data Cleaning: Ensure the extracted SNPs are exclusively biallelic. (included in the previous notebook)
  3. SNPs Retrieval from 1,000 Genomes Project: Extract the genotypes of 10,000 fingerprinting positions from the 1,000 Genomes Project’s VCF dataset.
  4. PCA Computation: Generate PCA’s eigenvectors and eigenvalues using PLINK.
  5. PCA Visualization: Visualize the PCA data, highlighting the relationships between samples using R.

Populations PCA

Owner

  • Name: Ahmed Moustafa
  • Login: ahmedmoustafa
  • Kind: user
  • Location: Egypt
  • Company: American University in Cairo

Bioinformatics and Genomics Data Scientist

GitHub Events

Total
  • Push event: 5
Last Year
  • Push event: 5