sierra-local
sierra-local: A lightweight standalone application for drug resistance prediction - Published in JOSS (2019)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Retrieve HIVdb algorithm as XML and apply locally to HIV sequences
Basic Info
- Host: GitHub
- Owner: PoonLab
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 76 MB
Statistics
- Stars: 10
- Watchers: 8
- Forks: 7
- Open Issues: 8
- Releases: 7
Metadata Files
README.md
sierra-local
sierra-local is a Python 3 implementation of the Stanford University HIV Drug Resistance Database (HIVdb) Sierra web service for generating drug resistance predictions from HIV-1 sequence data. This Python package enables laboratories to run this prediction algorithm without needing to transmit patient data over the network, and confers full control over data provenance and security.
Rationale
The Stanford HIVdb algortihm is a widely used method for predicting the drug resistance phenotype of an HIV-1 infection based on its genetic sequence, specifically the complete or partial sequence of the genomic regions encoding the primary targets of modern antiretroviral therapy. Prediction of HIV-1 drug resistance is an important component in the routine clinical management of HIV-1 infection, being faster and more cost-effective than the direct measurement of drug resistance by culturing virus isolates in the laboratory. The HIVdb algorithm is essentially rules-based classifier that is actively maintained and released to the public domain in the ASI (Algorithm Specification Interface) exchange format, demonstrating a laudable commitment by the HIVdb developers to open-source research and clinical practice.
The HIVdb algorithm is usually accessed through a web service hosted at Stanford University (Sierra). While this is a convenient format for many clinical laboratories, it requires a network connection and the transmission of potentially sensitive patient-derived data to a remote server. Transmitting sequence data over the web may present a bottleneck for laboratories located at sites that are geographically distant from the host server, or where network traffic is prone to service disruptions. Furthermore, the use of HIV-1 sequence data in criminal cases raises significant issues around data privacy.
Our objective was to build a lightweight, open-source Python implementation of the HIVdb algorithm for processing data on a local computer without sending any data over the network. During the development of sierra-local, the maintainers of Sierra released the source code for their web service under a permissive free software license (GPL v3.0). We were thrilled that the HIVdb developers elected to release their server code, but we remained committed to complete sierra-local so the HIV research and clinical communities can process their own data without needing to install and maintain an Apache server, build an SQL database, or to install a sizeable number of software dependencies.
Dependencies
We tried to minimize dependencies:
- Python 3 (tested on Python 3.9.0 and Python 3.10.9)
- Python modules (used by updater.py script):
- requests
- NucAmino v0.1.3 or later (included with the package).
- Post-Align is the new alignment program and requires the following dependencies (included with the package as well):
- Cython==0.29.32
- more-itertools==9.1.0
- orjson==3.9.1
- types-setuptools==67.8.0.0
- minimap2
Installation
Setting up Sierra-Local
On a Linux system, you can install sierra-local as follows:
git clone http://github.com/PoonLab/sierra-local
cd sierra-local
sudo python3 setup.py install
Note that you need super-user privileges to install the package by this method. For more detailed instructions, please refer to the document INSTALL.md that should be located in the root directory of this Python package.
Alternatively, you can install with pip, which doesn't need sudo.
git clone http://github.com/PoonLab/sierra-local
cd sierra-local
pip install --user .
Using sierra-local
Command-line interface (CLI)
Before running, we recommend using the sierralocal/updater.py script to update the data files associated with this repository to the most updated versions available from hivfacts. Please note that you do need the requests package stated above for the following command to run. More information regarding this script is detailed below.
console
(sierra) will@dyn172-30-75-11 sierra-local % python3 sierralocal/updater.py
Downloading the latest HIVDB XML File
Updated HIVDB XML into /Users/will/projects/sierra-local/sierralocal/data/HIVDB_9.8.xml
Downloading the latest file to determine apobec
Updated apobecs file to /Users/will/projects/sierra-local/sierralocal/data/apobecs.csv
Downloading the latest file to determine is unusual
Updated is unusual file to /Users/will/projects/sierra-local/sierralocal/data/rx-all_subtype-all.csv
Downloading the latest file to determine SDRM mutations
Updated SDRM mutations file to /Users/will/projects/sierra-local/sierralocal/data/sdrms_hiv1.csv
Downloading the latest file to determine mutation type
Updated mutation type file to /Users/will/projects/sierra-local/sierralocal/data/mutation-type-pairs_hiv1.csv
Downloading the latest APOBEC DRMS File
Updated APOBEC DRMs into /Users/will/projects/sierra-local/sierralocal/data/apobec_drms.json
Downloading the latest subtype genotype property File
Updated reference fasta to /Users/will/projects/sierra-local/sierralocal/data/genotype-properties.csv
Downloading the latest subtype reference fasta file
Updated reference fasta to /Users/will/projects/sierra-local/sierralocal/data/genotype-references.fasta
To run a quick example, use the following sequence of commands:
console
art@Jesry:~/git/sierra-local$ python3 scripts/retrieve_hivdb_data.py RT RT.fa
art@Jesry:~/git/sierra-local$ sierralocal RT.fa
searching path /root/miniconda3/envs/py395/lib/python3.10/site-packages/sierralocal/data/HIVDB*.xml
searching path /root/miniconda3/envs/py395/lib/python3.10/site-packages/sierralocal/data/apobec*.json
HIVdb version 9.4
Aligning using post-align
Aligned RT.fax
100 sequences found in file RT.fa.
Writing JSON to file RT_results.json
Time elapsed: 19.796 seconds (5.1555 it/s)
To swap between running Post-Align (default) and NucAmnio, you can specify using -alignment, where inputting nuc will result in NucAmino being called
will@Jesry:~/sierra-local# sierralocal RT.fa -alignment nuc
searching path /root/miniconda3/envs/py395/lib/python3.10/site-packages/sierralocal/data/HIVDB*.xml
searching path /root/miniconda3/envs/py395/lib/python3.10/site-packages/sierralocal/data/apobec*.json
HIVdb version 9.4
Found NucAmino binary /root/miniconda3/envs/py395/lib/python3.10/site-packages/sierralocal/bin/nucamino-linux-amd64
Aligned RT.fa
100 sequences found in file RT.fa.
Writing JSON to file RT_results.json
Time elapsed: 4.1417 seconds (25.45 it/s)
retrieve_hivdb_data.py is a Python script that we provided to download small samples of HIV-1 sequence data from the Stanford HIVdb database. In this case, we have retrieved 100 reverse transcriptase (RT) sequences and processsed them with the sierra-local pipeline. By default, the results are written to the file [FASTA basename]_results.json:
console
art@Jesry:~/git/sierra-local$ head RT_results.json
[
{
"inputSequence": {
"header": "U54771.CM240.CRF01_AE.0"
},
"subtypeText": "CRF01_AE",
"validationResults": [],
"alignedGeneSequences": [
{
"firstAA": 1,
We can also specify a different ASI (XML) file representing an earlier version of the HIVdb algorithm to reprocess the same data, and save the output to another file:
console
(sierra) will@dyn172-30-75-11 sierra-local % sierralocal -xml sierralocal/data/HIVDB_9.8.xml RT.fa -o RT-v9.8.json
searching path /Users/will/miniconda3/envs/sierra/lib/python3.9/site-packages/sierralocal/data/apobec_drms.json
HIVdb version 9.8
Aligning using post-align
Aligned RT.fa
100 sequences found in file RT.fa.
Writing JSON to file RT-v9.8.json
Time elapsed: 9.3831 seconds (10.709 it/s)
We find that switching versions of the algorithm from 8.5 to 8.7 results in substantial changes in resistance scores for these data with the introduction of a new drug doravirine (DOR). In addition, two of 100 cases were scored differently:
console
art@Jesry:~/git/sierra-local$ python3 scripts/json2csv.py RT_results.json RT-v8.7.json.csv
art@Jesry:~/git/sierra-local$ python3 scripts/json2csv.py RT-v8.5.json RT-v8.5.json.csv
art@Jesry:~/git/sierra-local$ R
```R
v5 <- read.csv('RT-v8.5.json.csv') v7 <- read.csv('RT-v8.7.json.csv') v7 <- v7[,-which(names(v7)=='DOR')] temp <- sapply(1:nrow(v5), function(i) any(v5[i,] != v7[i,])) which(temp) [1] 23 63 v5[23,] name subtype ABC AZT D4T DDI FTC LMV TDF EFV ETR NVP RPV 23 Y14503.BCF13.O.22 Group O 15 -10 -10 10 60 60 -10 50 45 95 65 v7[23,] name subtype ABC AZT D4T DDI FTC LMV TDF EFV ETR NVP RPV 23 Y14503.BCF13.O.22 Group O 15 -10 -10 10 60 60 -10 50 55 105 75 v5[63,] name subtype ABC AZT D4T DDI FTC LMV TDF EFV ETR NVP RPV 63 AF102332.A11.B.62 B 90 115 115 90 80 80 60 0 0 0 0 v7[63,] name subtype ABC AZT D4T DDI FTC LMV TDF EFV ETR NVP RPV 63 AF102332.A11.B.62 B 90 115 115 90 80 80 60 0 10 10 10 ```
To specify your own JSON file for APOBEC DRMS, you can call -json followed by your file:
(sierra) will@dyn172-30-75-11 sierra-local % sierralocal RT.fa -json sierralocal/data/apobec_drms.c9583ac2.json
searching path /Users/will/miniconda3/envs/sierra/lib/python3.9/site-packages/sierralocal/data/HIVDB*.xml
HIVdb version 9.4
Aligning using post-align
Aligned RT.fa
100 sequences found in file RT.fa.
Writing JSON to file RT_results.json
Time elapsed: 9.3442 seconds (10.751 it/s)
As a Python module
If you have downloaded the package source to your computer, you can also run sierra-local as a Python module from the root directory of the package. In the following example, we are calling the main function of sierra-local from an interactive Python session: ```console (sierra) will@dyn172-30-75-11 sierra-local % git clone http://github.com/PoonLab/sierra-local (sierra) will@dyn172-30-75-11 sierra-local % cd sierra-local (sierra) will@dyn172-30-75-11 sierra-local % python3 Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:35:41) [Clang 16.0.6 ] on darwin Type "help", "copyright", "credits" or "license" for more information.
from sierralocal.main import sierralocal sierralocal('RT.fa', 'RT.json') searching path /Users/will/projects/sierra-local/sierralocal/data/HIVDB*.xml searching path /Users/will/projects/sierra-local/sierralocal/data/apobec_drms.json HIVdb version 9.8 Aligning using post-align Aligned RT.fa 100 sequences found in file RT.fa. Writing JSON to file RT.json (100, 0.0769047737121582) ``
Note that this doesn't require anysudo` privileges.
Subtyping
Currently, we do not support the subtyping function present in sierrapy. However, there is a framework of the script located in /sierralocal/deprecated/subtyper.py. We do not recommend using this feature through our scripts without modification as it is not maintained or tested. However, you can manually enable this feature by changing the do_subtype values to True in sierralocal/nucaminohook.py and importing the subtyper class from the subtyper script.
Updating the algorithm and other data files
The Stanford HIVdb database regularly updates its resistance genotyping algorithm and publishes the associated ASI2 XML file on their github, hivfacts. In previous versions of sierra-local, we used Python to automatically query this website and download the newest version if it was not already present on the user's computer. Subsequent changes to the Stanford HIVdb website, however, meant that users would have to install several additional dependencies in order for Python to locate the required files. As a result, we decided to make the updater.py script an optional step of the pipeline.
Manually running the script enabled me to grab the most recent versions of the ASI2 and other mutation data files from the HIVdb webserver:
Now of course, it would be much simpler to manually download these files yourself in hivfacts, but in some applications there may be a benefit to automating this step.
About Us
This project was developed at the Poon lab within the Department of Pathology and Laboratory Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario. Development of sierra-local was supported in part by a grant from the Canadian Institutes of Health Research (PJT-156178).
If you use sierra-local for your work, please cite the following paper: * sierra-local: A lightweight standalone application for drug resistance prediction. Jasper C Ho, Garway T Ng, Mathias Renaud, Art FY Poon, (2019). Journal of Open Source Software, 4(33), 1186, https://doi.org/10.21105/joss.01186
If you want to reference the validation of sierra-local on HIV-1 pol data, please cite the following preprint: * sierra-local: A lightweight standalone application for secure HIV-1 drug resistance prediction. Jasper C Ho, Garway T Ng, Mathias Renaud, Art FY Poon. bioRxiv 393207; doi: https://doi.org/10.1101/393207
Owner
- Name: PoonLab
- Login: PoonLab
- Kind: organization
- Repositories: 20
- Profile: https://github.com/PoonLab
JOSS Publication
sierra-local: A lightweight standalone application for drug resistance prediction
Authors
Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada
Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada
Department of Pathology and Laboratory Medicine, Western University, London, ON, Canada
Tags
bioinformatics HIV/AIDS drug resistance sequence analysis clinical virologyGitHub Events
Total
- Create event: 3
- Release event: 1
- Issues event: 9
- Watch event: 4
- Delete event: 1
- Issue comment event: 30
- Push event: 17
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 7
- Fork event: 3
Last Year
- Create event: 3
- Release event: 1
- Issues event: 9
- Watch event: 5
- Delete event: 1
- Issue comment event: 32
- Push event: 17
- Pull request review event: 1
- Pull request review comment event: 1
- Pull request event: 8
- Fork event: 3
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Art Poon | a****n@g****m | 109 |
| WilliamZekaiWang | W****g@g****m | 49 |
| jzpero | j****3@u****a | 31 |
| jzpero | j****o@g****m | 24 |
| Tammy Ng | t****2@u****a | 22 |
| WilliamZekaiWang | 1****g | 21 |
| GopiGugan | g****n@u****a | 13 |
| MathiasRenaud | m****5@g****m | 9 |
| SandeepThokala | i****8@g****m | 5 |
| Jasper Ho | j****o@g****m | 2 |
| William Zekai Wang | w****l@W****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 98
- Total pull requests: 23
- Average time to close issues: 6 months
- Average time to close pull requests: 12 days
- Total issue authors: 18
- Total pull request authors: 4
- Average comments per issue: 3.0
- Average comments per pull request: 0.43
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 8
- Average time to close issues: about 20 hours
- Average time to close pull requests: 5 days
- Issue authors: 6
- Pull request authors: 2
- Average comments per issue: 0.63
- Average comments per pull request: 0.38
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ArtPoon (47)
- jzpero (13)
- Kanyerezi30 (10)
- SandeepThokala (7)
- schorlton-bugseq (3)
- aguang (2)
- erick-dorlass (2)
- azneto (2)
- WilliamZekaiWang (2)
- MathiasRenaud (2)
- marcbennedbaek (1)
- ghost (1)
- nedjoni (1)
- LilyAnderssonLee (1)
- GopiGugan (1)
Pull Request Authors
- WilliamZekaiWang (17)
- ArtPoon (2)
- schorlton-bugseq (2)
- GopiGugan (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 180 last-month
- Total docker downloads: 29
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 6
- Total maintainers: 3
pypi.org: sierralocal
Local execution of HIVdb algorithm
- Homepage: https://github.com/PoonLab/sierra-local
- Documentation: https://sierralocal.readthedocs.io/
- License: GNU General Public License v3 (GPLv3)
-
Latest release: 0.4.1
published 5 months ago