ReLERNN

Recombination Landscape Estimation using Recurrent Neural Networks

https://github.com/kr-colab/ReLERNN

Keywords

deep-learning population-genomics recombination recurrent-neural-networks

Last synced: 7 months ago · JSON representation

Repository

Recombination Landscape Estimation using Recurrent Neural Networks

Basic Info

Host: GitHub
Owner: kr-colab
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 1.02 MB

Statistics

Stars: 59
Watchers: 6
Forks: 11
Open Issues: 5
Releases: 1

Topics

deep-learning population-genomics recombination recurrent-neural-networks

Created almost 7 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

ReLERNN

Recombination Landscape Estimation using Recurrent Neural Networks

====================================================================

ReLERNN uses deep learning to infer the genome-wide landscape of recombination from as few as four individually sequenced chromosomes, or from allele frequencies inferred by pooled sequencing. This repository contains the code and instructions required to run ReLERNN, and includes example files to ensure everything is working properly. The manuscript detailing ReLERNN can be found here.

Recommended installation on linux

Install tensorflow 2 on your system. Directions can be found here. You will also need to install the CUDA toolkit and CuDNN. ReLERNN requires the use of a CUDA-Enabled NVIDIA GPU. The current version of ReLERNN has been successfully tested with tensorflow/2.2.0, cudatoolkit/10.1.243, and cudnn/7.6.5.

Further dependencies for ReLERNN can be installed with pip. This is done with the following commands:

bash $ git clone https://github.com/kr-colab/ReLERNN.git $ cd ReLERNN $ pip install .

It should be as simple as that.

Installing `CUDA`

We are asked often about installing CUDA and the NVIDIA requirements. This can be quite finicky depending on your hardware setup, but many users have had luck installing the tensorflow/cuda requirements using mamba with the following recipe

```bash $ mamba create -n relearnn-1.0.0 -c conda-forge -c nvidia python=3.10 tensorflow=2.15.0 cuda-toolkit h5py -y

then install ReLERNN as above

$ git clone https://github.com/kr-colab/ReLERNN.git $ cd ReLERNN $ pip install . ```

Testing ReLERNN

An example VCF file (5 contigs; 10 haploid chromosomes) and a shell script for running ReLERNN's four modules is located in $/ReLERNN/examples. To test the functionality of ReLERNN simply use the following commands:

bash $ cd examples $ ./example_pipeline.sh

Provided everything worked as planned, $ReLERNN/examples/example_output/ should be populated with a few directories along with the files: example.PREDICT.txt and example.PREDICT.BSCORRECT.txt. The latter is the finalized output file with your recombination rate predictions and estimates of uncertainty.

The above example took 57 seconds to complete on a Xeon machine using four CPUs and one NVIDIA 2070 GPU. Note that the parameters used for this example were designed only to test the success of the installation, not to make accurate predictions. Please use the guidelines below for the best results when analyzing real data.

You can now test the functionality of ReLERNN for use with pool-seq data by using the following commands:

bash $ cd examples $ ./example_pipeline_pool.sh

Estimating a recombination landscape from individually sequenced chromosomes

The ReLERNN pipeline is executed using four commands: ReLERNN_SIMULATE, ReLERNN_TRAIN, ReLERNN_PREDICT, and the optional ReLERNN_BSCORRECT (see the Method flow diagram).

Before running ReLERNN

ReLERNN takes as input a VCF file of biallelic variants. Users should use appropriate QC techniques (filtering low-quality variants, etc.) and remove non-biallelic variants before running ReLERNN. Small contigs (<< 250 SNPs) should not be included in the genome file --genome, though these do not need to be removed from the VCF. ReLERNN also requires that the number of sampled chromosomes is identical across all contigs, and VCFs should be filtered accordingly. Hemizygous chromosomes or haploid samples in an otherwise diploid dataset should ideally be run separately using a separate VCF. It is possible to treat hemizygous chromosomes as "diploids with missing data" using the --forceDiploid option, however this is not recommended. It is now possible to run ReLERNN on VCFs with missing genotypes (coded as a .).

If you want to make predictions based on equilibrium simulations, you can skip ahead to executing ReLERNN_SIMULATE. While ReLERNN is generally robust to demographic model misspecification, prediction accuracy may potentially be improved by simulating the training set under a demographic history that accurately matches that of your sample. ReLERNN optionally takes the output files from three popular demographic history inference programs (stairwayplot_v1, SMC++, and MSMC), and simulates a training set under these histories. Note: for SMC++ use the .csv output (option -c in SMC++). It is up to the user to perform the proper due diligence to ensure that the population size histories reported by these programs are sound. In our opinion, unless you know exactly how these programs work and you expect your data to represent a history dramatically different from equilibrium, you are better off skipping this step and training ReLERNN on equilibrium simulations. Once you have run one of the demographic history inference programs listed above, you simply provide the raw output file from that program to ReLERNN_SIMULATE using the --demographicHistory option.

Step 1) ReLERNN_SIMULATE

ReLERNN_SIMULATE reads your VCF file and splits it by chromosome. The chromosomes to be evaluated must be specified by providing a BED file of said positions using the --genome argument. A BED-formatted accessibility mask (with non-overlapping ascending windows) may be optionally provided using the --mask option. Use the --phased or --unphased flag to train using phased or unphased genotypes (the default is unphased). It is required that the VCF file use the extension .vcf. The prefix of that file will serve as the prefix used for all output files (e.g. running ReLERNN on the file population7.vcf will generate the result file population7.PREDICT.txt). It is strongly recommended that you use the default setting for --maxWinSize, larger values can cause training to fail and smaller values can result in lower accuracy. Users are required to provide an estimate of the per-base mutation rate for your sample, along with an estimate for generation time (in years). If you previously ran one of the demographic history inference programs listed above, just use the same values that you used for them. This is also where you will point to the output from said program, using --demographicHistory. If you are not simulating under an inferred history, simply do not include this option. Importantly, you can also set a value for the maximum recombination rate to be simulated using --upperRhoThetaRatio. If you have an a priori estimate for an upper bound to the ratio of rho to theta go ahead and set this here. Keep in mind that higher values will dramatically slow the coalescent simulations. We recommend using the default number of train/test/validation simulation examples, but if you want to simulate more examples, go right ahead. ReLERNN_SIMULATE then uses msprime to simulate 100k training examples and 1k validation and test examples. All output files will be generated in subdirectories within the path provided to --projectDir. It is required that you use the same projectDir for all four ReLERNN commands. If you want to run ReLERNN of multiple populations/taxa, you can run them independently using a unique projectDir for each. This step is simulation heavy and runtimes will strongly depend on the inferred population size.

The complete list of arguments used in ReLERNN_SIMULATE is found below: ``` ReLERNN_SIMULATE -h

usage: ReLERNN_SIMULATE [-h] [-v VCF] [-g GENOME] [-m MASK] [-d OUTDIR] [-n DEM] [-u MU] [-l GENTIME] [-r UPRTR] [-t NCPU] [-s SEED] [--phased] [--unphased] [--forceDiploid] [--phaseError PHASEERROR] [--maxWinSize WINSIZEMX] [--maskThresh MASKTHRESH] [--nTrain NTRAIN] [--nVali NVALI] [--nTest NTEST]

optional arguments: -h, --help show this help message and exit -v VCF, --vcf VCF Filtered and QC-checked VCF file. Important: Every row must correspond to a biallelic SNP with no missing data! -g GENOME, --genome GENOME BED-formatted (i.e. zero-based) file corresponding to chromosomes and positions to consider -m MASK, --mask MASK BED-formatted file corresponding to inaccessible bases -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN -n DEM, --demographicHistory DEM Output file from either stairwayplot, SMC++, or MSMC -u MU, --assumedMu MU Assumed per-base mutation rate -l GENTIME, --assumedGenTime GENTIME Assumed generation time (in years) -r UPRTR, --upperRhoThetaRatio UPRTR Assumed upper bound for the ratio of rho to theta -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to total available cores) -s SEED, --seed SEED Random seed --phased VCF file is phased --unphased VCF file is unphased --forceDiploid Treats all samples as diploids with missing data (bad idea; see README) --phaseError PHASEERROR Fraction of bases simulated with incorrect phasing --maxWinSize WINSIZEMX Max number of sites per window to train on. Important: too many sites causes problems in training --maskThresh MASKTHRESH Discard windows where >= maskThresh percent of sites are inaccessible --nTrain NTRAIN Number of training examples to simulate --nVali NVALI Number of validation examples to simulate --nTest NTEST Number of test examples to simulate ```

Step 2) ReLERNN_TRAIN

ReLERNN_TRAIN takes the simulations created by ReLERNN_SIMULATE and uses them to train a recurrent neural network. Again, we recommend using the defaults for --nEpochs and --nValSteps, but if you would like to do more training, feel free. To set the GPU to be used for machines with multiple dedicated GPUs use --gpuID (e.g. if running an analysis on two populations simultaneously, set --gpuID 0 for the first population and --gpuID 1 for the second). ReLERNN_TRAIN outputs some basic metrics of the training results for you, generating the figure $/projectDir/networks/vcfprefix.pdf. The default value of -nCPU is 1 for this step, as this is often produces the shortest training times per epoch (depending on missing data and the mask). Feel free to test training times using multiple cores, and set -nCPU to whatever works best for your data/machine.

The complete list of arguments used in ReLERNN_TRAIN is found below: ``` ReLERNN_TRAIN -h

usage: ReLERNN_TRAIN [-h] [-d OUTDIR] [--nEpochs NEPOCHS] [-t NCPU] [-s SEED] [--nValSteps NVALSTEPS] [--gpuID GPUID]

optional arguments: -h, --help show this help message and exit -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to 1) -s SEED, --seed SEED Random seed --nEpochs NEPOCHS Number of epochs to train over --nValSteps NVALSTEPS Number of validation steps --gpuID GPUID Identifier specifying which GPU to use ```

Step 3) ReLERNN_PREDICT

ReLERNN_PREDICT now takes the same VCF file you used in ReLERNN_SIMULATE and predicts per-base recombination rates in non-overlapping windows across the genome. The output file of predictions will be created as $/projectDir/vcfprefix.PREDICT.txt. It is important to note that the window size used for predictions might be different for different chromosomes. A complete list of the window sizes used for each chromosome can be found in third column of $/projectDir/networks/windowSizes.txt. Use the optional --minSites argument to exclude windows with fewer than the desired number of SNPs. If you are not interested in estimating confidence intervals around the predictions, your ReLERNN analysis is now finished. If you are getting OOM errors at this step you can try setting --batchSizeOverride to a value significantly less than the total number of windows along a chromosome (found in the last column of $/projectDir/networks/windowSizes.txt).

The complete list of arguments used in ReLERNN_PREDICT is found below: ``` ReLERNN_PREDICT -h

usage: ReLERNN_PREDICT [-h] [-v VCF] [-d OUTDIR] [--minSites MINS] [--gpuID GPUID] [--batchSizeOverride BSO] [-s SEED]

optional arguments: -h, --help show this help message and exit -v VCF, --vcf VCF Filtered and QC-checked VCF file. Important: Every row must correspond to a biallelic SNP with no missing data! -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN --phased VCF file is phased --unphased VCF file is unphased --minSites MINS Minimum number of SNPs in a genomic window required to return a prediction --gpuID GPUID Identifier specifying which GPU to use --batchSizeOverride BSO Batch size to use for low memory applications -s SEED, --seed SEED Random seed

```

Optional Step 4) ReLERNN_BSCORRECT

However, you might want to have an idea of the uncertainty around your predictions. This is where ReLERNN_BSCORRECT comes in. ReLERNN_BSCORRECT generates 95% confidence intervals around each prediction, and additionally attempts to correct for systematic bias (see Materials and Methods). It does this by simulated a set of --nReps examples at each of nSlice recombination rate bins. It then uses the network that was trained in ReLERNN_TRAIN and estimates the distribution of predictions around each know recombination rate. The result is both an estimate of uncertainty, and a prediction that has been slightly corrected to account for biases in how the network predicts in this area of parameter space. The resulting file is created as $/projectDir/vcfprefix.PREDICT.BSCORRECT.txt, and is formatted similarly to $/projectDir/vcfprefix.PREDICT.txt, with the addition of columns for the low and high 95CI bounds. Note that this step is simulation heavy and runtimes can be slow.

The complete list of arguments used in ReLERNN_BSCORRECT is found below: ``` ReLERNN_BSCORRECT -h

usage: ReLERNN_BSCORRECT [-h] [-d OUTDIR] [-t NCPU] [-s SEED] [--gpuID GPUID] [--nSlice NSLICE] [--nReps NREPS]

optional arguments: -h, --help show this help message and exit -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to total available cores) -s SEED, --seed SEED Random seed --gpuID GPUID Identifier specifying which GPU to use --nSlice NSLICE Number of recombination rate bins to simulate over --nReps NREPS Number of simulations per step ```

Estimating a recombination landscape from pool-seq data

Similar to the directions above, the ReLERNN pipeline for pool-seq data is executed using four commands: ReLERNN_SIMULATE_POOL, ReLERNN_TRAIN_POOL, ReLERNN_PREDICT_POOL, and the optional ReLERNN_BSCORRECT.

Before running ReLERNN

ReLERNN for pool-seq analyses takes as input a file of genomic positions and allele frequencies (herein a 'POOLFILE'; see example file).

Similar to ReLERNN for individually sequenced chromosomes, if you want to make predictions based on equilibrium simulations, you can skip ahead to executing ReLERNN_SIMULATE_POOL. While ReLERNN is generally robust to demographic model misspecification, prediction accuracy may potentially be improved by simulating the training set under a demographic history that accurately matches that of your sample. ReLERNN optionally takes the raw output files from three popular demographic history inference programs (stairwayplot_v1, SMC++, and MSMC), and simulates a training set under these histories. It is up to the user to perform the proper due diligence to ensure that the population size histories reported by these programs are sound. In our opinion, unless you know exactly how these programs work and you expect your data to represent a history dramatically different from equilibrium, you are better off skipping this step and training ReLERNN on equilibrium simulations. Once you have run one of the demographic history inference programs listed above, you simply provide the raw output file from that program to ReLERNNSIMULATEPOOL using the --demographicHistory option.

Step 1) ReLERNNSIMULATEPOOL

ReLERNN_SIMULATE_POOL reads your POOLFILE and splits it by chromosome. The number of chromosomes in the pool must be specified using the --sampleDepth argument. The genomic chromosomes to be evaluated must be specified by providing a BED file of said positions using the --genome argument. A BED-formatted accessibility mask (with non-overlapping ascending windows) may be optionally provided using the --mask option. It is required that the POOLFILE use the extension .pool. The prefix of that file will serve as the prefix used for all output files (e.g. running ReLERNN on the file population7.pool will generate the result file population7.PREDICT.txt). It is strongly recommended that you use the default setting for --maxSites, larger values can cause training to fail and smaller values can result in lower accuracy. Users are required to provide an estimate of the per-base mutation rate for your sample, along with an estimate for generation time (in years). If you previously ran one of the demographic history inference programs listed above, just use the same values that you used for them. This is also where you will point to the output from said program, using --demographicHistory. If you are not simulating under an inferred history, simply do not include this option. Importantly, you can also set a value for the maximum recombination rate to be simulated using --upperRhoThetaRatio. If you have an a priori estimate for an upper bound to the ratio of rho to theta go ahead and set this here. Keep in mind that higher values will dramatically slow the coalescent simulations. We recommend using the default number of train/test/validation simulation examples, but if you want to simulate more examples, go right ahead. ReLERNN_SIMULATE_POOL then uses msprime to simulate 100k training examples and 1k validation and test examples. All output files will be generated in subdirectories within the path provided to --projectDir. It is required that you use the same projectDir for all four ReLERNN commands. If you want to run ReLERNN of multiple populations/taxa, you can run them independently using a unique projectDir for each. This step is simulation heavy and runtimes will strongly depend on the inferred population size.

The complete list of arguments used in ReLERNN_SIMULATE_POOL is found below: ``` ReLERNNSIMULATEPOOL -h

usage: ReLERNNSIMULATEPOOL [-h] [-p POOL] [--sampleDepth SAMD] [-g GENOME] [-m MASK] [-d OUTDIR] [-n DEM] [-u MU] [-l GENTIME] [-r UPRTR] [-t NCPU] [-s SEED] [--maxSites WINSIZEMX] [--maskThresh MASKTHRESH] [--nTrain NTRAIN] [--nVali NVALI] [--nTest NTEST]

optional arguments: -h, --help show this help message and exit -p POOL, --pool POOL Filtered and QC-checked POOL file. --sampleDepth SAMD Number of chromosomes in pool -g GENOME, --genome GENOME BED-formatted (i.e. zero-based) file corresponding to chromosomes and positions to consider -m MASK, --mask MASK BED-formatted file corresponding to inaccessible bases -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN -n DEM, --demographicHistory DEM Output file from either stairwayplot, SMC++, or MSMC -u MU, --assumedMu MU Assumed per-base mutation rate -l GENTIME, --assumedGenTime GENTIME Assumed generation time (in years) -r UPRTR, --upperRhoThetaRatio UPRTR Assumed upper bound for the ratio of rho to theta -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to total available cores) -s SEED, --seed SEED Random seed --maxSites WINSIZEMX Max number of sites per window to train on. Important: too many sites causes problems in training --maskThresh MASKTHRESH Discard windows where >= maskThresh percent of sites are inaccessible --nTrain NTRAIN Number of training examples to simulate --nVali NVALI Number of validation examples to simulate --nTest NTEST Number of test examples to simulate ```

Step 2) ReLERNNTRAINPOOL

ReLERNN_TRAIN_POOL takes the simulations created by ReLERNN_SIMULATE_POOL and uses them to train a recurrent neural network. The only difference here is that the mean read depth of the pool must be specified using the --readDepth argument. You can also specify a minor allele frequency threshold (--maf), if a similar threshold was used to generate your POOLFILE. Again, we recommend using the defaults for --nEpochs and --nValSteps, but if you would like to do more training, feel free. To set the GPU to be used for machines with multiple dedicated GPUs use --gpuID (e.g. if running an analysis on two populations simultaneously, set --gpuID 0 for the first population and --gpuID 1 for the second). ReLERNN_TRAIN_POOL outputs some basic metrics of the training results for you, generating the figure $/projectDir/networks/poolprefix.pdf. The default value of -nCPU for this step is the max number of available cores, as training on pooled data with a single core can be very slow.

The complete list of arguments used in ReLERNN_TRAIN_POOL is found below: ``` ReLERNNTRAINPOOL -h

usage: ReLERNNTRAINPOOL [-h] [-d OUTDIR] [--readDepth SEQD] [--maf MAF] [--nEpochs NEPOCHS] [--nValSteps NVALSTEPS] [-t NCPU] [-s SEED] [--gpuID GPUID]

optional arguments: -h, --help show this help message and exit -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN --readDepth SEQD Mean read depth of the pool --maf MAF discard simulated sites with allele frequencies < maf --nEpochs NEPOCHS Number of epochs to train over --nValSteps NVALSTEPS Number of validation steps -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to total available cores) -s SEED, --seed SEED Random seed --gpuID GPUID Identifier specifying which GPU to use ```

Step 3) ReLERNNPREDICTPOOL

ReLERNN_PREDICT_POOL now takes the same POOL file you used in ReLERNN_SIMULATE_POOL and predicts per-base recombination rates in non-overlapping windows across the genome. The output file of predictions will be created as $/projectDir/poolprefix.PREDICT.txt. It is important to note that the window size used for predictions might be different for different chromosomes. A complete list of the window sizes used for each chromosome can be found in third column of $/projectDir/networks/windowSizes.txt. Use the optional --minSites argument to exclude windows with fewer than the desired number of SNPs. If you are not interested in estimating confidence intervals around the predictions, your ReLERNN analysis is now finished. If you are getting OOM errors at this step you can try setting --batchSizeOverride to a value significantly less than the total number of windows along a chromosome (found in the last column of $/projectDir/networks/windowSizes.txt).

The complete list of arguments used in ReLERNN_PREDICT_POOL is found below: ``` ReLERNNPREDICTPOOL -h

usage: ReLERNN_PREDICT [-h] [-p POOL] [-d OUTDIR] [--minSites MINS] [--batchSizeOverride BSO] [--gpuID GPUID] [-s SEED]

optional arguments: -h, --help show this help message and exit -p POOL, --pool POOL Filtered and QC-checked POOL file. -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN --minSites MINS Minimum number of SNPs in a genomic window required to return a prediction --batchSizeOverride BSO Batch size to use for low memory applications --gpuID GPUID Identifier specifying which GPU to use -s SEED, --seed SEED Random seed ```

Optional Step 4) ReLERNN_BSCORRECT

This step is exactly the same as in ReLERNN for individually sequenced chromosomes (above).

The complete list of arguments used in ReLERNN_BSCORRECT is found below: ``` ReLERNN_BSCORRECT -h

usage: ReLERNN_BSCORRECT [-h] [-d OUTDIR] [-t NCPU] [-s SEED] [--gpuID GPUID] [--nSlice NSLICE] [--nReps NREPS]

optional arguments: -h, --help show this help message and exit -d OUTDIR, --projectDir OUTDIR Directory for all project output. NOTE: the same projectDir must be used for all functions of ReLERNN -t NCPU, --nCPU NCPU Number of CPUs to use (defaults to total available cores) -s SEED, --seed SEED Random seed --gpuID GPUID Identifier specifying which GPU to use --nSlice NSLICE Number of recombination rate bins to simulate over --nReps NREPS Number of simulations per step ```

Owner

Name: Kern-Ralph CoLab
Login: kr-colab
Kind: organization
Location: University of Oregon

Website: http://kr-colab.github.io
Repositories: 23
Profile: https://github.com/kr-colab

Repositories associated with the collaborary research group of Andrew Kern and Peter Ralph

GitHub Events

Total

Issues event: 24
Watch event: 10
Issue comment event: 76
Push event: 2
Fork event: 1

Last Year

Issues event: 24
Watch event: 10
Issue comment event: 76
Push event: 2
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 100
Total Committers: 7
Avg Commits per committer: 14.286
Development Distribution Score (DDS): 0.26

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Jeffrey Adrion	j**n@i**u	74
Jeff Adrion	j**n@t**r	9
Andrew Kern	7****n	8
Jared Galloway	j**7@g**m	5
Jeffrey Adrion	j**n@u**u	2
Jeff Adrion	j**n@n**r	1
Jeffrey Adrion	j**n@J**l	1

Committer Domains (Top 20 + Academic)

n244.cm.cluster: 1 uoregon.edu: 1 talapas-ln1.cm.cluster: 1 indiana.edu: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 50
Total pull requests: 19
Average time to close issues: 5 months
Average time to close pull requests: 4 days
Total issue authors: 34
Total pull request authors: 3
Average comments per issue: 4.22
Average comments per pull request: 0.47
Merged pull requests: 17
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 12
Pull requests: 0
Average time to close issues: about 2 months
Average time to close pull requests: N/A
Issue authors: 9
Pull request authors: 0
Average comments per issue: 4.83
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jiangzy26 (5)
Joyce121 (5)
tanu06 (3)
jbruxaux (3)
xingguowu (3)
stsmall (2)
mbursell (2)
zy66-su (1)
VarIr (1)
irarun (1)
VaiaPatta1985 (1)
jradrion (1)
andreaswallberg (1)
dylandebaun (1)
PandoraMagis (1)

ReLERNN

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ReLERNN

Recombination Landscape Estimation using Recurrent Neural Networks

Recommended installation on linux

Installing CUDA

then install ReLERNN as above

Testing ReLERNN

Estimating a recombination landscape from individually sequenced chromosomes

Before running ReLERNN

Step 1) ReLERNN_SIMULATE

Step 2) ReLERNN_TRAIN

Step 3) ReLERNN_PREDICT

Optional Step 4) ReLERNN_BSCORRECT

Estimating a recombination landscape from pool-seq data

Before running ReLERNN

Step 1) ReLERNNSIMULATEPOOL

Step 2) ReLERNNTRAINPOOL

Step 3) ReLERNNPREDICTPOOL

Optional Step 4) ReLERNN_BSCORRECT

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Installing `CUDA`