https://github.com/bartongroup/sm_varalign

https://github.com/bartongroup/sm_varalign

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: bartongroup
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 11.9 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

VarAlign

VarAlign is a python module that aggregates human genetic variation over the columns of a multiple sequence alignment.

Table of Contents

Installing

Installing VarAlign (uses Conda)

```sh

Download

$ git clone https://github.com/stuartmac/VarAlign.git

Set up conda environment with all requirements

You may need to add Bioconda channels:

$ conda config --add channels defaults $ conda config --add channels bioconda $ conda config --add channels conda-forge $ $ cd VarAlign $ conda env create -f environment.yml $ source activate varalign-env-py3

Install VarAlign

$ pip install .

Run tests (some will probably fail, in particular ProIntVar isn't installed yet!)

$ python -m unittest discover tests ```

Enabling structural analysis

*requires ProIntVar and Arpeggio

Arpeggio needs to go in a seperate environment because it requires Python 2. ```

Setup environment with arpeggio dependencies

$ conda create -n arpeggio python=2 pip numpy biopython openbabel $ source activate arpeggio

Get patched Arpeggio

Remember to leave the VarAlign folder if you're following this in order!

$ git clone https://bitbucket.org/biomadeira/arpeggio $ cd arpeggio $ python arpeggio.py -h ```

Install and configure ProIntVar. (NB. ProIntVar requires Python 3.) ```

Install ProIntVar (https://github.com/bartongroup/ProIntVar)

$ source activate varalign-env-py3 $ git clone https://github.com/bartongroup/ProIntVar.git $ cd ProIntVar

Patch ProIntVar and install

$ git apply /path/to/VarAlign/ProIntVar.patch $ pip install .

Configure ProIntVar

$ ProIntVar-config-setup prointvar_config.ini

*** Edit the following values in prointvar_config.ini ***

arpeggio_bin = /path/to/arpeggio/arpeggio.py

python_exe = /path/to/anaconda/envs/arpeggio/bin/python

python_path = /path/to/anaconda/envs/arpeggio/python/lib/site-packages/

$ ProIntVar-config-load prointvar_config.ini

Check it works (rerun VarAlign tests)

If you ran the tests earlier you may see a FileExists error for .../VarAlign/tests/tmp, delete this and try again.

$ cd path/to/VarAlign/ $ python -m unittest discover tests ```

Configuration

VarAlign uses a configuration file to set key paths and parameters that need to be set before you run. Priority is given to a config file that is present in the execution directory, letting you keep parameters beside results, but a global config file can also be used.

Setting up a config file in the working directory ```sh $ cd /path/to/desired/working/dir/

Get a copy of the template config file shipped with VarAlign

$ cp /path/to/VarAlign/varalign/config.txt ./

Edit the settings as you require

Testing that the new values are correctly loaded by VarAlign

$ python

from varalign.config import defaults defaults.gnomad './sampleswissprotPF00001.18_full.vcf.gz' ```

Run the pipeline

I recommend you download a Pfam alignment that has at least a few human sequences and then try:

varalign --species HUMAN <YOUR_ALIGNMENT>

Known issues on our cluster

AACon is bundled with VarAlign and needs java to run. Some nodes on our cluster are missing java so you may wish to install java-jdk in the conda environment.

I have gnomad downloaded under my homedir in .../NOBACK/resources/gnomad/, set this up in config.txt if you like.

Owner

  • Name: Geoff Barton's Computational Biology Group
  • Login: bartongroup
  • Kind: organization
  • Location: Dundee, Scotland, UK

GitHub Events

Total
  • Watch event: 1
  • Push event: 5
Last Year
  • Watch event: 1
  • Push event: 5

Dependencies

requirements.txt pypi
  • biopython *
  • matplotlib >=2.0.2,<=2.1.1
  • numpy *
  • pandas ==0.20.3
  • proteofav *
  • pysam *
  • pyvcf *
  • requests *
  • requests_cache >=0.4.13
  • scikit-learn *
  • scipy *
  • seaborn *
  • tqdm *
setup.py pypi
environment.yml pypi
  • proteofav *