homolig
Homolig is a series of tools designed to compare the physiotypic properties of TCR and BCRs. Created by Alexander Girgis, Emily Davis-Marcisak, and Theron Palmer.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
Homolig is a series of tools designed to compare the physiotypic properties of TCR and BCRs. Created by Alexander Girgis, Emily Davis-Marcisak, and Theron Palmer.
Basic Info
Statistics
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 2
- Releases: 2
Metadata Files
README.md
What is Homolig?
Homolig is a python package to physiochemically compare immune receptor and epitope sequences. There are two modules: (1) Compute pairwise sequence distance between sequences / Assign Clusters / Write UMAP (2) Generate descriptive statistics on physiochemical properties of selected sequences.
Getting Started
Installation
The below steps are necessary to use the Homolig clustering module, written in Python3.
1. Clone the repository
bash
git clone https://github.com/FertigLab/Homolig
2. If using docker: may build local docker image using Dockerfile. Alternatively, install Homolig within a virtual environment. A global installation is not recommended due to several package version dependencies.
```bash
apt install python3-venv #if not already installed
cd Homolig python3 -m venv ./homolig-venv source ./homolig-venv/bin/activate python3 -m pip install . ```
- Now try using Homolig to cluster example immune repertoires. If successful, output files from this test script will populate the repository parent directory.
```bash
sudo chmod +x ./tests/testcode-python-clustering-module.sh #if you don't have privileges to execute test script
./tests/testcode-python-clustering-module.sh ```
Documentation
Homolig uses the IMGT/V-QUEST reference directory release 202214-2 (05 April 2022).
File format
The input file is a comma separated file containing the TCR or BCR CDR3 amino acid sequence and varible gene name in IMGT format. See example inputs in ./test-data/.
For cases where paired alpha and beta chain information is available:
| CDR3.beta.aa | TRBV | CDR3.alpha.aa | TRAV |
| --- | --- | --- | --- |
| CASSAGTSPTDTQYF | TRBV6-401 | CAVMDSSYKLIF | TRAV1-201 |
Basic Usage: Pairwise Distances
Recommended usage for Module 1: Pairwise similarity scores is through the wrapper homoligwrapper.py, located in ./homolig/. ```python python3 $WRAPPERDIR/homoligwrapper.py -h usage: homolig_wrapper.py [-h] [-i INPUT] [-s SEQ] [-c CHAINS] [-m METRIC] [-sp SPECIES] [-mode MODE] [-i2 INPUT2] [-o OUTPUT] [-v VERBOSE] [-g SAVEGERMLINE] [-si SAVEREFORMATTED_INPUT]
options: -h, --help show this help message and exit -i INPUT, --input INPUT Input .csv file -s SEQ, --seq SEQ Sequence type. May be one of [tcr, bcr, seq]. -c CHAINS, --chains CHAINS Chain locus. May be one of [alpha, beta, heavy, light]. Can be omitted if --seq == 'seq'. -m METRIC, --metric METRIC Distance matrix used in comparisons. Default is aadist. -sp SPECIES, --species SPECIES Species for which to query V gene sequences. Default is human. -mode MODE, --mode MODE Either 'pairwise' sequence comparison or 'axb' between two sequence groups. -i2 INPUT2, --input2 INPUT2 Second sequence group with which to compare first file. -o OUTPUT, --output OUTPUT Desired output file path/filename. Defaults to input file directory. -v VERBOSE, --verbose VERBOSE Specify verbosity during execution. -g SAVEGERMLINE, --savegermline SAVEGERMLINE Save CDR alignments separately during execution. -si SAVEREFORMATTEDINPUT, --savereformattedinput SAVEREFORMATTED_INPUT Save input file after renaming V genes (may be useful in post-analysis).
``` Possible arguments for clustering metric include any delimited file placed in the directory ./homolig/data/align-matrices/. This currently corresponds to 13 matrices used in our manuscript, but users may create any arbitrary similarity matrix with a similar file format.
To cluster the results of a Homolig run, you may use the wrapper clusterHomolig.py. In the instance of large datasets, you may consider clusterHomolig.pca.py which generates a UMAP using a PCA reduction of the output similarity scores:
```python python3 $WRAPPERDIR/clusterHomolig.py -h usage: clusterHomolig.py [-h] [-i INPUT] [-c NUM_CLUSTERS] [-o OUTPUT]
options: -h, --help show this help message and exit -i INPUT, --input INPUT Input .h5ad file -c NUMCLUSTERS, --numclusters NUM_CLUSTERS Expected number of clusters. May be any integer. -o OUTPUT, --output OUTPUT Desired output file path/filename. Defaults to input file directory.
```
To generate a UMAP reduction based on the NxN similarity matrix, you may use the wrapper homoligwriteumap.py:
```python python3 $WRAPPERDIR/clusterHomolig.py -h usage: clusterHomolig.py [-h] [-i INPUT] [-c NUM_CLUSTERS] [-o OUTPUT]
options: -h, --help show this help message and exit -i INPUT, --input INPUT Input .h5ad file -c NUMCLUSTERS, --numclusters NUM_CLUSTERS Expected number of clusters. May be any integer. -o OUTPUT, --output OUTPUT Desired output file path/filename. Defaults to input file directory. ```
Basic Usage: Descriptive Module
Functions to describe the physiochemical properties of arbitrary sequence groups are written in R. Please see functions in ./homolig/_rcode/score-sequences.r. For a walkthrough of basic usage see ./tests/testcode-r-characterization-module.rmd.
Citation
Upon publication, please cite our manuscript, "Comparative Assessment of Physiochemical Metrics for the Clustering of Adaptive Immune Receptor Repertoires" (currently in preparation). For now, please cite this repository.
Girgis, A., Davis-Marcisak, E., & Palmer, T. Homolig - tools for the physiochemical analysis of immune repertoires [Computer software]
Contact
Please send feedback to Alex Girgis - agirgis3@jhu.edu
License
Distributed under AGPL 3.0. See LICENSE for more information.
Owner
- Name: FertigLab
- Login: FertigLab
- Kind: organization
- Email: ejfertig@jhmi.edu
- Repositories: 68
- Profile: https://github.com/FertigLab
Software projects in computational biology and bioinformatics in Elana Fertig's lab in Oncology Biostatistics and Bioinformatics at JHMI
Citation (CITATION.cff)
cff-version: 1.2.0
title: Homolig - tools for the physiochemical analysis of immune repertoires
message: 'If you use this software, please cite it using the metadata from this file.'
type: software
authors:
- given-names: 'Alexander '
family-names: Girgis
email: agirgis3@jhmi.edu
affiliation: Johns Hopkins University
orcid: 'https://orcid.org/0000-0002-4706-5480'
- given-names: Emily
family-names: Davis-Marcisak
affiliation: Johns Hopkins University
orcid: 'https://orcid.org/0000-0002-8624-1013'
- given-names: Theron
family-names: Palmer
affiliation: Johns Hopkins University
orcid: 'https://orcid.org/0000-0002-8806-4062'
license: AGPL-3.0
GitHub Events
Total
- Create event: 9
- Issues event: 5
- Release event: 6
- Delete event: 4
- Issue comment event: 2
- Push event: 13
- Public event: 1
- Pull request event: 7
- Fork event: 1
Last Year
- Create event: 9
- Issues event: 5
- Release event: 6
- Delete event: 4
- Issue comment event: 2
- Push event: 13
- Public event: 1
- Pull request event: 7
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 5
- Average time to close issues: 4 days
- Average time to close pull requests: 5 days
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 5
- Average time to close issues: 4 days
- Average time to close pull requests: 5 days
- Issue authors: 1
- Pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- GaryWang7 (2)
Pull Request Authors
- dimalvovs (2)
- GaryWang7 (2)
- agirgis3 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- breathe ==4.34.0
- furo ==2022.6.21
- sphinx ==5.0.2
- sphinx-copybutton ==0.5.0
- sphinxcontrib-moderncmakedomain ==3.21.4
- sphinxcontrib-svg2pdfconverter ==1.2.0
- build ==0.8.0 test
- numpy ==1.21.5 test
- numpy ==1.19.3 test
- numpy ==1.22.2 test
- pytest ==7.0.0 test
- pytest-timeout * test
- scipy ==1.5.4 test
- scipy ==1.8.0 test
- python 3.11 build