gwaswa

https://github.com/unicorn-23/gwaswa

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: unicorn-23
Language: Python
Default Branch: main
Size: 75.5 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created about 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Citation

GwasWA Manual

GwasWA: A GWAS One-stop Analysis Platform from WGS Data to Variant Effect Assessment

GwasWA Installation Guide
General Parameters
WGS Data Processing
GWAS Data Pre-processing
Association Analysis
- Association analysis, --step association
- Select significant variant, --step selectsnp
Assessment of Variant Functional Effect
- Variant effect assessment, --step assess
Quick Start

GwasWA Installation Guide

To install GwasWA, follow these steps:

conda env create -f environment.yml

conda activate pipe

pip install -r requirements.txt

To access GwasWA globally, add the GwasWA folder to the environment variable:

export PATH="/path/to/gwaswa:$PATH"

Append variable settings to the end of the file ~/.bashrc. Execute the following command to make the changes take effect:

source ~/.bashrc

General Parameters

--version: Retrieve the current version of the tool.
-o, --output <path>: The default value is current directory. Specify the directory for the output file.
--nosave: Prevent the tool from saving intermediate files. # WGS Data Processing

Download sequence data, `--step downloadsra`

To download sequence data, utilize the following commands and their respective parameters:

--sra <str>: Download SRA files based on specified SRA accessions. Separate multiple accessions by spaces.
--sralist <filename>: Download SRA files using a list in a srr_list.txt file. Each line in the file represents a SRA accession.
--nThrds <int>: Number of simultaneous downloads to be initiated.

gwaswa --step downloadsra --sra SRR1111111 [SRR2222222 ...] gwaswa --step downloadsra --sralist srr_list.txt

The downloaded SRA files associated with the specified SRA accession(s) will be stored in the gwaswaOutput/wgs/sra directory. In case of download or integrity verification failures, the file err_sra_log.txt will be generated in the gwaswaOutput/wgs/sra directory to track failed SRA accessions.

Convert SRA to FASTQ, `--step sratofastq`

To convert SRA files to FASTQ format, employ the following command with the respective parameters:

--sradir <path>: Directory containing the SRA files to be converted into FASTQ format.
--nThrds <int>: Number of simultaneous conversions into FASTQ format, including compression into .gz files.

gwaswa --step sratofastq --sradir gwaswaOutput/wgs/sra

Once the conversion of the input SRA files in the designated directory is completed, the resulting FASTQ files will be stored in the gwaswaOutput/wgs/raw directory in compressed format.

FASTQ quality control, `--step readsqc`

To perform quality control on FASTQ files, utilize the following command with the associated parameters:

--rawfastqdir <path>: Directory containing FASTQ files for quality control.
--quality <int>: The default value is 20. Set the Phred quality threshold, low-quality bases at the 3' end are trimmed based on this threshold.
--phred <str>: Choose the Illumina version for quality scoring. Options:
- phred33: Default. For Illumina 1.9+ using ASCII 33 quality scores.
- phred64: For Illumina 1.5 using ASCII 64 quality scores.
--length <int>: The default value is 20. Sets a length threshold. Reads below this threshold after quality control will be rejected.
--stringency <int>: The default value is 1. Allows a certain number of bases of the linker sequence to remain at the end.
--error <float>: The default value is 0.1. Specifies the maximum allowable error rate.
--nThrds <int>: Number of concurrent quality control processes for FASTQ files, including compression into .fq.gz format.

gwaswa --step readsqc --rawfastqdir gwaswaOutput/wgs/raw

Once the quality control process is completed, the cleaned FASTQ files will be stored in the gwaswaOutput/wgs/clean directory.

Quality evaluation, `--step qualityevaluation`

To assess the quality of FASTQ files, utilize the subsequent command alongside its specific parameters:

--fastqdir <path>: Directory containing the FASTQ files needing quality evaluation.
--nThrds <int>: Number of threads allocated for quality evaluation.

gwaswa --step qualityevaluation --fastqdir gwaswaOutput/wgs/clean

The quality evaluation report, generated using fastqc and multiqc, will be stored in the gwaswaOutput/wgs/qualityEvaluation directory.

Download & index reference genome, `--step downloadref`

To download and index the reference genome, use the following commands with their respective parameters:

--accession <str>: Use this to provide a NCBI Reference sequence accession if you don't have a local reference genome file available. This will download the reference genome sequence.

gwaswa --step downloadref --refaccession GCF_000001735.4
--taxon <str>: If you don't have a local reference genome file, you can provide a NCBI Taxonomy ID or taxonomy name to download the reference genome sequence.
gwaswa --step downloadref --taxon 3702
--refgenome <filename>: Use this parameter to create an index if you already have a local reference genome file available.

gwaswa --step downloadref --refgenome example/ref.fa.gz

The reference genome sequence is downloaded and stored in the gwaswaOutput/wgs/ref directory, and the reference genome index file is stored in the same directory as the reference genome.

Alignment reference genome, `--step align`

To align the reference genome, use the following command with its associated parameters:

--cleanfastqdir <path>: Directory for storing each FASTQ file after quality control.
--alignalgorithm <str>: Choice of alignment algorithm.
- mem: Default. Recommended for read lengths in the range of 70bp-1Mbp.
- bwasw: More sensitive for reads with frequent gaps, suitable for reads typically 70bp-1Mbp in length.
- backtrack: Recommended for reads less than 100bp.
--refgenome <filename>: Local reference genome file to be used for alignment.
--nThrds <int>: Number of FASTQ files to be aligned simultaneously.

gwaswa --step align --cleanfastqdir gwaswaOutput/wgs/clean --refgenome gwaswaOutput/wgs/ref/ref.fa

Upon aligning the reference genome with the FASTQ file in the input directory, the resulting BAM files will be generated in the gwaswaOutput/wgs/align directory.

BAM files processing, `--step dealbam`

To process BAM files, use the following command with its associated parameters:

--bamdir <path>: Directory containing each BAM file.
--refgenome <filename>: Local reference genome file.
--delPCR: Removal of PCR duplicates.
--nThrds <int>: Number of BAM files to be processed simultaneously.

gwaswa --step dealbam --bamdir gwaswaOutput/wgs/align --refgenome gwaswaOutput/wgs/ref/ref.fa

Upon processing the BAM files in the input directory, tasks such as sorting, PCR duplicate removal, and index building will be performed. The resulting sample_marked.bam and sample_marked.bam.bai files will be generated in the gwaswaOutput/wgs/processed directory.

Variant detection, `--step detect`

To detect variants, use the following command along with its associated parameters:

--processedbamdir <path>: Directory containing each processed BAM file.
--refgenome <filename>: Local reference genome file.
--nThrds <int>: Number of simultaneous BAM files for variant detection.

gwaswa --step detect --processedbamdir gwaswaOutput/wgs/processed --refgenome gwaswaOutput/wgs/ref/ref.fa

Upon detecting the variants in the BAM files within the input directory, the resulting sample_g.vcf file and its index will be generated in the gwaswaOutput/wgs/gvcf directory.

Jointgenotype, `--step jointgenotype`

To perform jointgenotype, use the following command along with its associated parameters:

--gvcfdir <path>: Directory containing each gVCF file.
--refgenome <filename>: Local reference genome file.
--nThrds <int>: Number of gVCF files split by chromosome simultaneously.

gwaswa --step jointgenotype --gvcfdir gwaswaOutput/wgs/gvcf --refgenome gwaswaOutput/wgs/ref/ref.fa

The joint genotyping process involves several steps:

Dividing gVCF Files by Chromosome: Initially, each sample's gVCF file in the input directory is split by chromosome and stored in the gwaswaOutput/wgs/gvcf_chr directory.
Merging Samples by Chromosome: Next, all samples are merged by chromosome, generating chrN_g.vcf and its index file in the gwaswaOutput/wgs/vcf directory.
Re-alignment of reference genome file: Each chrN_g.vcf file is re-aligned to obtain the chrN.vcf file.
Final Merging for Genotyping: The chrN.vcf files are then merged to generate genotype.vcf and its index files, stored in the gwaswaOutput/wgs/vcf directory.

VCF quality control, `--step vcfqc`

To conduct VCF quality control, use the following command along with its associated parameters:

--vcffile <filename>: Specifies the VCF file containing variant genotype information.
Hard filtering for SNPs:
- --snpQUAL <float>: The default value is 30.0. This parameter represents the variant quality value, which measures the reliability of the variant based on the QUAL field in the VCF.
- --snpQD <float>: The default value is 2.0. QD (SNPQualByDepth) is the ratio of the variant quality value divided by the depth of coverage.
- --snpMQ <float>: The default value is 40.0. MQ (RMSMappingQuality) describes the degree of dispersion of the quality value of the alignment, rather than just the average value.
- --snpFS <float>: The default value is 60.0. FS (FisherStrand) is derived from the p-value of Fisher's test and describes strand specificity for reads containing variants and reads containing reference sequence bases during sequencing or alignment.
- --snpSOR <float>: The default value is 3.0. SOR (StrandOddsRatio) is calculated using the symmetric odds ratio test, corrected for strand specificity.
- --snpMQRankSum <float>: The default value is -12.5. The MappingQualityRankSumTest is used to assess whether the mapping qualities of the reads supporting the reference allele and the alternate allele are significantly different for SNP positions.
- --snpReadPosRankSum <float>: The default value is -8.0. The Read Position Rank Sum Test for SNPs evaluates the differences in the position of the reads supporting the reference versus the alternate allele.
Hard filtering for indels:
- --indelQUAL <float>: The default value is 30.0.
- --indelQD <float>: The default value is 2.0.
- --indelFS <float>: The default value is 60.0.
- --indelSOR <float>: The default value is 3.0.
- --indelMQRankSum <float>: The default value is -12.5.
- --indelReadPosRankSum <float>: The default value is -8.0.

gwaswa --step vcfqc --vcffile gwaswaOutput/wgs/vcf/genotype.vcf --refgenome gwaswaOutput/wgs/ref/ref.fa

The genotype.vcf file undergoes quality control, generating genotype_filter.vcf and its index file, which are stored in the gwaswaOutput/wgs/vcf directory.

GWAS Data Pre-processing

Genotype imputation, `--step impute`

To perform genotype imputation, utilize the following command with its associated parameters:

--genotypefile <filename>: The VCF file containing variant genotype information.
--nMem <str>: Maximum memory footprint.
--nThrds <int>: Number of multiple threads used for genotype imputation.

gwaswa --step impute --genotypefile gwaswaOutput/wgs/vcf/genotype_filter.vcf.gz

Upon executing this command, the input VCF file will be imputed with genotypes, and the resulting genotype.vcf.gz file will be generated in the gwaswaOutput/gwas/transvcf directory.

Convert VCF to bfiles, `--step transvcf`

To convert VCF to bfiles, utilize the following command with its associated parameters:

--genotypefile <filename>: VCF file containing variant genotype information.
--phenotypefile <filename>: The phenotype file comprises three columns: sample ID, family ID, and phenotype value (separated by spaces).

gwaswa --step transvcf --genotypefile gwaswaOutput/gwas/transvcf/genotype.vcf.gz --phenotypefile pheno.txt

This command executes the conversion process, generating bfiles stored in the gwaswaOutput/gwas/transvcf directory. The bfiles include BIM, FAM, and BED files, while the phenotype file is added to the FAM file.

GWAS quality control, `--step gwasqc`

To perform GWAS quality control, use the following command with its associated parameters:

--bfiledir <path>: Directory containing the bfiles.
--atgc: Retains only ATGC alleles.
--snpmiss <float>: The default value is 0.2. Excludes SNPs with high missingness among subjects.
--indmiss <float>: The default value is 0.2. Excludes individuals with a high rate of genotype deletion.
--maf <float>: The default value is 0.05. Sets the minimum allele frequency, and filters out SNPs with low MAF.
--hwe <str>: The default value is 1e-6. Filters out SNPs deviating from Hardy-Weinberg equilibrium in the control group.
--hweall <str>: The default value is 1e-6. Filters out all sample deviations from Hardy-Weinberg equilibrium.
--indep <str>: Utilized for Linkage Disequilibrium (LD) pruning, specifying the window size, step, and variance inflation factor. For instance, --indep 50 5 2 would mean a window size of 50 SNPs, a step of 5 SNPs, and a variance inflation factor of 2.
--indepPairwise <str>: Applied for LD-based SNP pruning using pairwise LD calculation. Specifying the window size, step, and paired r2 threshold.
--indepPairphase <str>: This parameter is also used for LD-based SNP pruning, but it specifically considers phased haplotype data.
--heterozygosity <float>: The default value is 3. Exclude individuals with high or low heterozygosity.
--checksex: Checks gender differences.
--rmproblemsex: Deletes individuals with problematic gender assignments.
--imputesex: Imputes gender based on genotype information.

gwaswa --step gwasqc --bfiledir gwaswaOutput/gwas/transvcf

After running this command, the bfiles within the input directory will go through quality control process and all the quality-controlled bfiles and intermediate files are stored in the gwaswaOutput/gwas/qc directory.

Population structure analysis, `--step pca`

To conduct population structure analysis, use the following command with its associated parameters:

--cleanbfiledir <path>: Directory containing the bfiles.
--pcanum <int>: The default value is 6. The number of principal components for analysis.
--groupnum <int>: Number of populations for analysis. If not specified, it determines the group number with the lowest CV error among 2-20 groups.

gwaswa --step pca --groupnum 3 --cleanbfiledir gwaswaOutput/gwas/gwasqc

Upon execution, the input bfiles undergo population structure analysis and PCA, resulting in the generation of pca.eigenval and pca.eigenvec files containing PCA eigenvalues and eigenvectors. Additionally, it produces diagrams illustrating the principal component analysis (pca.png) and population structure (admixture.png). All these files are stored in the gwaswaOutput/gwas/pca directory.

Principal component analysis chart.

Population structure chart.

Kinship analysis, `--step kinship`

To conduct kinship analysis, use the following command with its associated parameter:

--cleanbfiledir <path>: Directory containing the bfiles.

gwaswa --step kinship --cleanbfiledir gwaswaOutput/gwas/gwasqc

Upon execution, the input bfiles undergo kinship analysis, resulting in the generation of kinship.txt and the kinship.png diagram. These files are stored in the gwaswaOutput/gwas/kinship directory.

Kinship analysis chart.

Association Analysis

Association analysis, `--step association`

To conduct association analysis, use the following command with its associated parameters:

--cleanbfiledir <path>: Directory containing the bfiles.
Association analysis model, optional:
- --lm: Generalized linear model.
  
  gwaswa --step association --cleanbfiledir gwaswaOutput/gwas/gwasqc --lm
- --lmm: Mixed linear model.
  - --pcafile <filename>: Optionally provide the PCA result file as a covariate.
  - --kinshipfile <filename>: Optionally provide the kinship result file as a covariate. If not provided, it will be automatically generated.
    
    gwaswa --step association --cleanbfiledir gwaswaOutput/gwas/gwasqc --lmm --pcafile gwaswaOutput/gwas/pca/pca.eigenvec

Upon execution, the association analysis generates a result.assoc.txt file containing information for each variant site. Additionally, it creates graphical representations of the analysis, including a Manhattan plot and a QQ plot. These files are stored in the gwaswaOutput/gwas/association directory.

Manhattan plot.

QQ plot.

Select significant variant, `--step selectsnp`

To select significant variants, use the following command with its associated parameters:

--assocfile: Association analysis result file.
--pvaluelimit <str>: The default value is 1e-7. Filters out SNPs greater than the specified p-value limit.

gwaswa --step selectsnp --assocfile gwaswaOutput/gwas/association/lm/result.assoc.txt --pvaluelimit 1e-7

Executing this command generates a snps.txt file that contains significantly associated SNPs. These files are stored in the gwaswaOutput/gwas/selectsnp directory.

Assessment of Variant Functional Effect

Variant effect assessment, `--step assess`

To assess the functional effects of variants, utilize the following parameters:

--snpfile <filename>: Input a VCF file containing variants. Each line in the file represents a variant, specifying the chromosome number, position, variant name, reference allele, and alternative allele. For instance, 16 57025062 rs11644125 C T.

--species <str>: Target species name for the analysis.

gwaswa --step assess --species homo_sapiens --snpfile example.vcf

Upon execution, the input variants are evaluated, generating an assessment_Summary.html file. This file is stored in the gwaswaOutput/assessment directory.

Quick Start

WGS data Processing

This guide offers a systematic approach to processing WGS data using the E. coli dataset SRR1770413 as an example.

Download sequencing data, `--step downloadsra`

gwaswa --step downloadsra --sra SRR1770413 --output coli

The SRR1770413.sra file will be stored in the coli/gwaswaOutput/wgs/sra directory.

Convert SRA to FASTQ, `--step sratofastq`

gwaswa --step sratofastq --sradir coli/gwaswaOutput/wgs/sra --output coli

The converted files will be stored in the compressed format in the coli/gwaswaOutput/wgs/raw directory.

FASTQ quality control, `--step readsqc`

gwaswa --step readsqc --rawfastqdir coli/gwaswaOutput/wgs/raw --output coli

The quality-controlled FASTQ files will be stored in the coli/gwaswaOutput/wgs/clean directory in compressed .fq.gz format.

Quality evaluation, `--step qualityevaluation`

gwaswa --step qualityevaluation --fastqdir coli/gwaswaOutput/wgs/clean --output coli

The quality evaluation results will be saved in the coli/gwaswaOutput/wgs/qualityEvaluation directory.

Download & index reference genome, `--step downloadref`

gwaswa --step downloadref --accession GCF_000005845.2 --output coli

The reference genome and its index will be stored in the coli/gwaswaOutput/wgs/ref directory.

Alignment of reference genome, `--step align`

gwaswa --step align --cleanfastqdir coli/gwaswaOutput/wgs/clean --refgenome coli/gwaswaOutput/wgs/ref/ref.fa --output coli

The alignment results will be stored in the coli/gwaswaOutput/wgs/align directory.

BAM files processing, `--step dealbam`

gwaswa --step dealbam --bamdir coli/gwaswaOutput/wgs/align --refgenome gwaswaOutput/wgs/ref/ref.fa --output coli

The resulting processed BAM files are stored in the coli/gwaswaOutput/wgs/processed directory.

Variant detection, `--step detect`

gwaswa --step detect --processedbamdir coli/gwaswaOutput/wgs/processed --refgenome coli/gwaswaOutput/wgs/ref/ref.fa --output coli

After detection, sample_g.vcf and its index file are generated in the coli/gwaswaOutput/wgs/gvcf directory.

Jointgenotype, `--step jointgenotype`

gwaswa --step jointgenotype --gvcfdir coli/gwaswaOutput/wgs/gvcf --refgenome coli/gwaswaOutput/wgs/ref/ref.fa --output coli

The resulting genotype.vcf and its index file are stored in the coli/gwaswaOutput/wgs/vcf directory.

VCF quality control, `--step vcfqc`

gwaswa --step vcfqc --vcffile coli/gwaswaOutput/wgs/vcf/genotype.vcf --refgenome coli/gwaswaOutput/wgs/ref/ref.fa --output coli

The resulting genotype_filter.vcf and its index file are stored in the coli/gwaswaOutput/wgs/vcf directory.

GWAS analysis

This section covers GWAS data processing and association analysis using data from [^1]. The parameters are configured according to [^2].

Convert VCF to bfiles, `--step transvcf`

gwaswa --step transvcf --genotypefile gwaswa/example/genotype.vcf.gz --phenotypefile gwaswa/example/pheno.txt --output example

The output bfiles are stored in the example/gwaswaOutput/gwas/transvcf directory, including BIM, FAM, and BED files.

GWAS quality control, `--step gwasqc`

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --snpmiss 0.2 --indmiss 0.2 --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --snpmiss 0.02 --indmiss 0.02 --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --checksex --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --imputesex --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --maf 0.05 --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --hwe 1e-6 --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --hweall 1e-10 --output example

gwaswa --step gwasqc --bfiledir example/gwaswaOutput/gwas/gwasqc --indepPairwise 50 5 0.2 --output example

The processed bfiles, as well as intermediate quality control files, are stored in the example/gwaswaOutput/gwas/qc directory.

Population structure analysis, `--step pca`

gwaswa --step pca --groupnum 3 --cleanbfiledir example/gwaswaOutput/gwas/gwasqc --output example

The results of the population structure analysis are stored in the example/gwaswaOutput/gwas/pca directory.

Kinship analysis, `--step kinship`

gwaswa --step kinship --cleanbfiledir example/gwaswaOutput/gwas/gwasqc --output example

The kinship analysis results are stored in the example/gwaswaOutput/gwas/pca directory.

Association analysis, `--step association`

gwaswa --step association --cleanbfiledir example/gwaswaOutput/gwas/gwasqc --lm --output example

The results of the association analysis are stored in the example/gwaswaOutput/gwas/association directory.

Select significant variants, `--step selectsnp`

gwaswa --step selectsnp --assocfile example/gwaswaOutput/gwas/association/lm/result.assoc.txt --pvaluelimit 1e-5 --output example

The filtered results are stored in the example/gwaswaOutput/gwas/selectsnp directory.

Assessment of variant effect

The human non-coding variant rs11644125 is used as an example.

Variant effect assessment, `--step assess`

gwaswa --step assess --species homo_sapiens --snpfile gwaswa/example/rs11644125.vcf --output assess

Assessment of variant functional effects is stored in the assess/gwaswaOutput/gwas/assessment directory.

[^1]: Jiang K, Yang Z, Cui W, et al. An exome-wide association study identifies new susceptibility loci for age of smoking initiation in African-and European-American populations[J]. Nicotine and Tobacco Research, 2019, 21(6): 707-713.

[^2]: Marees A T, de Kluiver H, Stringer S, et al. A tutorial on conducting genomewide association studies: Quality control and statistical analysis[J]. International journal of methods in psychiatric research, 2018, 27(2): e1608.

Owner

Name: unicorn-23
Login: unicorn-23
Kind: user

Repositories: 1
Profile: https://github.com/unicorn-23

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Dependencies

requirements.txt pypi

astroid ==2.15.0
autopep8 ==1.6.0
binaryornot ==0.4.4
certifi ==2022.12.7
colorlog ==6.7.0
colormath ==3.0.0
dill ==0.3.6
docstring-to-markdown ==0.12
flake8 ==6.0.0
google-pasta ==0.2.0
isort ==5.12.0
kipoi-conda ==0.1.6
kipoi-utils ==0.7.7
lazy-object-proxy ==1.9.0
lzstring ==1.0.4
mccabe ==0.7.0
packaging ==23.0
platformdirs ==3.1.1
pluggy ==1.0.0
protobuf ==4.21.12
pyasn1 ==0.4.8
pyasn1-modules ==0.2.7
pycodestyle ==2.10.0
pydocstyle ==6.2.3
pyflakes ==3.0.1
pylint ==2.17.1
pyrle ==0.0.35
python-lsp-jsonrpc ==1.0.0
python-lsp-server ==1.7.1
pytoolconfig ==1.2.5
rope ==1.7.0
snowballstemmer ==2.2.0
sorted-nearest ==0.0.37
spectra ==0.0.11
text-unidecode ==1.3
toml ==0.10.2
tomlkit ==0.11.6
ujson ==5.7.0
whatthepatch ==1.0.4
wrapt ==1.15.0
yapf ==0.32.0
zstandard ==0.19.0

environment.yml conda

absl-py 1.4.0
admixture 1.3.0
aiohttp 3.8.4
aiosignal 1.3.1
argcomplete 3.0.8
argh 0.27.2
argtable2 2.13
arrow 1.2.3
astunparse 1.6.3
async-timeout 4.0.2
attrs 21.4.0
beautifulsoup4 4.11.2
binaryornot 0.4.4
biopython 1.81
blast 2.12.0
blinker 1.6.2
brotli 1.0.9
brotli-bin 1.0.9
brotlipy 0.7.0
bwa 0.7.17
bzip2 1.0.8
c-ares 1.18.1
ca-certificates 2022.12.7
cached-property 1.5.2
cached_property 1.5.2
cachetools 5.3.0
certifi 2022.12.7
cffi 1.14.6
chardet 5.1.0
charset-normalizer 2.1.1
click 8.1.3
clustalo 1.2.4
clustalw 2.1
colorama 0.4.6
coloredlogs 15.0.1
colorlog 6.7.0
colormath 3.0.0
contourpy 1.0.7
cookiecutter 2.1.1
cryptography 39.0.0
curl 7.87.0
cutadapt 4.4
cycler 0.11.0
deprecation 2.1.0
dnaio 0.10.0
ensembl-vep 109.3
entrez-direct 16.2
expat 2.5.0
fastqc 0.12.1
flatbuffers 2.0.8
font-ttf-dejavu-sans-mono 2.37
fontconfig 2.14.2
fonttools 4.39.3
freetype 2.12.1
frozenlist 1.3.3
future 0.18.3
gast 0.4.0
gatk4 4.3.0.0
gcta 1.93.2beta
gemma 0.98.3
gettext 0.21.1
gffutils 0.11.1
giflib 5.2.1
google-auth 2.17.3
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpc-cpp 1.47.1
grpcio 1.47.1
h5py 3.8.0
hdf5 1.14.0
htslib 1.16
humanfriendly 10.0
icu 70.1
idna 3.4
importlib-metadata 6.6.0
importlib-resources 5.12.0
importlib_resources 5.12.0
isa-l 2.30.0
jinja2 3.1.2
jinja2-time 0.2.0
joblib 1.2.0
jpeg 9e
keras 2.10.0
keras-preprocessing 1.1.2
kipoi 0.8.6
kipoi-conda 0.1.6
kipoi-utils 0.7.7
kipoiseq 0.7.1
kiwisolver 1.4.4
krb5 1.20.1
lcms2 2.15
lerc 4.0.0
libabseil 20220623.0
libaec 1.0.6
libblas 3.9.0
libbrotlicommon 1.0.9
libbrotlidec 1.0.9
libbrotlienc 1.0.9
libcblas 3.9.0
libcurl 7.87.0
libcxx 16.0.2
libdb 6.2.32
libdeflate 1.17
libedit 3.1.20191231
libev 4.33
libexpat 2.5.0
libffi 3.3
libgfortran 5.0.0
libgfortran5 12.2.0
libiconv 1.17
libidn2 2.3.4
liblapack 3.9.0
libnghttp2 1.51.0
libopenblas 0.3.21
libpng 1.6.39
libprotobuf 3.21.12
libsqlite 3.40.0
libssh2 1.10.0
libtiff 4.5.0
libunistring 0.9.10
libwebp-base 1.3.0
libxcb 1.13
libzlib 1.2.13
llvm-openmp 16.0.2
lzstring 1.0.4
mafft 7.520
markdown 3.4.3
markdown-it-py 2.2.0
markupsafe 2.1.2
matplotlib 3.7.1
matplotlib-base 3.7.1
mdurl 0.1.0
multidict 6.0.4
multiqc 1.14
munkres 1.0.7
muscle 5.1
mysql-connector-c 6.1.11
natsort 8.3.1
ncbi-datasets-cli 14.15.0
ncls 0.0.66
ncurses 6.3
networkx 3.1
numpy 1.24.3
oauthlib 3.2.2
openblas 0.3.21
openjdk 11.0.15
openjpeg 2.5.0
openssl 1.1.1t
opt_einsum 3.3.0
packaging 23.1
paml 4.10.6
pandas 2.0.1
patsy 0.5.3
pbzip2 1.1.13
pcre 8.45
perl 5.32.1
perl-algorithm-diff 1.1903
perl-archive-tar 2.40
perl-base 2.23
perl-bio-asn1-entrezgene 1.73
perl-bio-coordinate 1.007001
perl-bio-db-hts 3.01
perl-bio-featureio 1.6.905
perl-bio-samtools 1.43
perl-bio-searchio-hmmer 1.7.3
perl-bio-tools-phylo-paml 1.7.3
perl-bio-tools-run-alignment-clustalw 1.7.4
perl-bio-tools-run-alignment-tcoffee 1.7.4
perl-bioperl 1.7.8
perl-bioperl-core 1.7.8
perl-bioperl-run 1.007003
perl-business-isbn 3.007
perl-business-isbn-data 20210112.006
perl-capture-tiny 0.48
perl-carp 1.38
perl-class-data-inheritable 0.09
perl-common-sense 3.75
perl-compress-raw-bzip2 2.201
perl-compress-raw-zlib 2.202
perl-constant 1.33
perl-data-dumper 2.183
perl-db_file 1.858
perl-dbd-mysql 4.046
perl-dbi 1.643
perl-devel-stacktrace 2.04
perl-digest-hmac 1.04
perl-digest-md5 2.58
perl-encode 3.19
perl-encode-locale 1.05
perl-exception-class 1.45
perl-exporter 5.72
perl-exporter-tiny 1.002002
perl-extutils-makemaker 7.70
perl-file-listing 6.15
perl-file-slurp-tiny 0.004
perl-file-sort 1.01
perl-file-spec 3.48_01
perl-getopt-long 2.54
perl-html-parser 3.81
perl-html-tagset 3.20
perl-http-cookies 6.10
perl-http-daemon 6.16
perl-http-date 6.05
perl-http-message 6.36
perl-http-negotiate 6.01
perl-io-compress 2.201
perl-io-html 1.004
perl-io-socket-ssl 2.074
perl-io-string 1.08
perl-io-tty 1.16
perl-io-zlib 1.14
perl-ipc-run 20200505.0
perl-json 4.10
perl-json-xs 2.34
perl-libwww-perl 6.67
perl-libxml-perl 0.08
perl-list-moreutils 0.430
perl-list-moreutils-xs 0.430
perl-lwp-mediatypes 6.04
perl-mime-base64 3.16
perl-net-http 6.22
perl-net-ssleay 1.92
perl-ntlm 1.09
perl-parent 0.236
perl-pathtools 3.75
perl-perlio-gzip 0.20
perl-scalar-list-utils 1.62
perl-sereal 4.019
perl-sereal-decoder 4.025
perl-sereal-encoder 4.025
perl-set-intervaltree 0.12
perl-socket 2.027
perl-sub-uplevel 0.2800
perl-test-deep 1.130
perl-test-differences 0.69
perl-test-exception 0.43
perl-test-harness 3.44
perl-test-most 0.38
perl-test-warn 0.36
perl-text-csv 2.01
perl-text-diff 1.45
perl-time-local 1.30
perl-timedate 2.33
perl-tree-dag_node 1.32
perl-try-tiny 0.31
perl-types-serialiser 1.01
perl-uri 5.12
perl-url-encode 0.03
perl-www-robotrules 6.02
perl-xml-dom 1.46
perl-xml-dom-xpath 0.14
perl-xml-parser 2.44
perl-xml-regexp 0.04
perl-xml-xpathengine 0.14
pigz 2.6
pillow 9.4.0
pip 23.1.2
platformdirs 3.5.0
plink 1.90b6.21
pooch 1.7.0
protobuf 4.21.12
pthread-stubs 0.4
pyasn1 0.4.8
pyasn1-modules 0.2.7
pycparser 2.21
pyfaidx 0.7.2.1
pygments 2.15.1
pyjwt 2.6.0
pyopenssl 23.1.1
pyparsing 3.0.9
pyranges 0.0.120
pyrle 0.0.35
pysocks 1.7.1
python 3.9.0
python-dateutil 2.8.2
python-flatbuffers 23.1.21
python-isal 1.1.0
python-slugify 8.0.1
python-tzdata 2023.3
python_abi 3.9
pytz 2023.3
pyu2f 0.1.5
pyvcf3 1.0.3
pyyaml 6.0
re2 2022.06.01
readline 8.2
related 0.7.3
requests 2.29.0
requests-oauthlib 1.3.1
rich 13.3.5
rich-click 1.6.1
rsa 4.9
samtools 1.6
scikit-learn 1.0.2
scipy 1.10.1
seaborn 0.12.2
seaborn-base 0.12.2
setuptools 67.7.2
simplejson 3.19.1
singledispatch 3.6.1
six 1.16.0
snappy 1.1.10
sorted_nearest 0.0.37
soupsieve 2.3.2.post1
spectra 0.0.11
sqlite 3.40.0
statsmodels 0.13.5
t-coffee 12.00.7fb08c2
tabulate 0.9.0
tensorboard 2.10.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.10.0
tensorflow-base 2.10.0
tensorflow-estimator 2.10.0
tensorflow-hub 0.12.0
termcolor 2.3.0
text-unidecode 1.3
threadpoolctl 3.1.0
tinydb 4.7.1
tk 8.6.12
tornado 6.3
tqdm 4.65.0
trim-galore 0.6.10
typing-extensions 4.5.0
typing_extensions 4.5.0
tzdata 2023c
unicodedata2 15.0.0
unidecode 1.3.6
unzip 6.0
urllib3 1.26.15
viennarna 2.5.1
werkzeug 2.3.2
wget 1.20.3
wheel 0.40.0
wrapt 1.15.0
xopen 1.7.0
xorg-libxau 1.0.9
xorg-libxdmcp 1.1.3
xz 5.2.6
yaml 0.2.5
yarl 1.9.1
zipp 3.15.0
zlib 1.2.13
zstandard 0.19.0
zstd 1.5.2

gwaswa

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

GwasWA Manual

Contents

GwasWA Installation Guide

General Parameters

Download sequence data, --step downloadsra

Convert SRA to FASTQ, --step sratofastq

FASTQ quality control, --step readsqc

Quality evaluation, --step qualityevaluation

Download & index reference genome, --step downloadref

Alignment reference genome, --step align

BAM files processing, --step dealbam

Variant detection, --step detect

Jointgenotype, --step jointgenotype

VCF quality control, --step vcfqc

GWAS Data Pre-processing

Genotype imputation, --step impute

Convert VCF to bfiles, --step transvcf

GWAS quality control, --step gwasqc

Population structure analysis, --step pca

Kinship analysis, --step kinship

Association Analysis

Association analysis, --step association

Select significant variant, --step selectsnp

Assessment of Variant Functional Effect

Variant effect assessment, --step assess

Quick Start

WGS data Processing

Download sequencing data, --step downloadsra

Convert SRA to FASTQ, --step sratofastq

FASTQ quality control, --step readsqc

Quality evaluation, --step qualityevaluation

Download & index reference genome, --step downloadref

Alignment of reference genome, --step align

BAM files processing, --step dealbam

Variant detection, --step detect

Jointgenotype, --step jointgenotype

VCF quality control, --step vcfqc

GWAS analysis

Convert VCF to bfiles, --step transvcf

GWAS quality control, --step gwasqc

Population structure analysis, --step pca

Kinship analysis, --step kinship

Association analysis, --step association

Select significant variants, --step selectsnp

Assessment of variant effect

Variant effect assessment, --step assess

Owner

GitHub Events

Total

Last Year

Dependencies

Download sequence data, `--step downloadsra`

Convert SRA to FASTQ, `--step sratofastq`

FASTQ quality control, `--step readsqc`

Quality evaluation, `--step qualityevaluation`

Download & index reference genome, `--step downloadref`

Alignment reference genome, `--step align`

BAM files processing, `--step dealbam`

Variant detection, `--step detect`

Jointgenotype, `--step jointgenotype`

VCF quality control, `--step vcfqc`

Genotype imputation, `--step impute`

Convert VCF to bfiles, `--step transvcf`

GWAS quality control, `--step gwasqc`

Population structure analysis, `--step pca`

Kinship analysis, `--step kinship`

Association analysis, `--step association`

Select significant variant, `--step selectsnp`

Variant effect assessment, `--step assess`

Download sequencing data, `--step downloadsra`

Convert SRA to FASTQ, `--step sratofastq`

FASTQ quality control, `--step readsqc`

Quality evaluation, `--step qualityevaluation`

Download & index reference genome, `--step downloadref`

Alignment of reference genome, `--step align`

BAM files processing, `--step dealbam`

Variant detection, `--step detect`

Jointgenotype, `--step jointgenotype`

VCF quality control, `--step vcfqc`

Convert VCF to bfiles, `--step transvcf`

GWAS quality control, `--step gwasqc`

Population structure analysis, `--step pca`

Kinship analysis, `--step kinship`

Association analysis, `--step association`

Select significant variants, `--step selectsnp`

Variant effect assessment, `--step assess`