https://github.com/astrazeneca/selfpad

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".

https://github.com/astrazeneca/selfpad

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.8%) to scientific vocabulary

Keywords

antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer
Last synced: 9 months ago · JSON representation

Repository

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".

Basic Info
  • Host: GitHub
  • Owner: AstraZeneca
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 425 KB
Statistics
  • Stars: 12
  • Watchers: 3
  • Forks: 5
  • Open Issues: 0
  • Releases: 0
Topics
antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

SelfPAD:

Author: Talip Ucar (ucabtuc@gmail.com)

The official implementation of Improving Antibody Humanness Prediction using Patent Data

Table of Contents:

  1. Model
  2. Environment
  3. Configuration
  4. Training and Evaluation
  5. Structure of the repo
  6. Results
  7. Experiment tracking
  8. Citing the paper
  9. Citing this repo

Model

Pre-training | Fine-tuning :-------------------------:|:-------------------------: SelfPAD | SelfPAD

Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

pip install pipenv # To install pipenv if you don't have it already pipenv install --skip-lock # To install required packages. pipenv shell # To activate virtual env

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".

Configuration

There are two types of configuration files: 1. pad.yaml # Defines parameters and options for pre-training 2. humanness.yaml # Defines parameters and options for fine-training

Training and Evaluation

You can train and evaluate the model by using:

python selfpad_pretrain.py # For pre-training python selfpad_finetune.py # For fine-tuning it for humanness python selfpad_eval.py -ev test # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

Structure of the repo

- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py

- src
    |-selfpad.py
    |-selfpad_humanness.py

- config
    |-pad.yaml
    |-humanness.yaml
    
- utils_common
    |-arguments.py
    |-utils.py
    |-tokenizer.py
    ...
    
- utils_pretrain
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- utils_finetune
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- data
    |-test.csv
    ...
    
- results
    |-pretraining
    |-humanness
    ...
    

Results

Results at the end of training is saved under ./results directory. Results directory structure is as following:

- results
    |-task e.g. humanness, or pretraining
            |-evaluation
                |-clusters (for plotting t-SNE and PCA plots of embeddings)
            |-training
                |-model
                |-plots
                |-loss

You can save results of evaluations under "evaluation" folder.

Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging

Citing the paper

@article{ucar2024SelfPAD, title={Improving Antibody Humanness Prediction using Patent Data}, author={Ucar, Talip and Ramon, Aubin and Oglic, Dino and Croasdale-Wood, Rebecca and Diethe, Tom and Sormanni, Pietro}, journal={arXiv preprint arXiv:2110.04361}, year={2024} }

Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

@Misc{talip_ucar_2024_SelfPAD, author = {Talip Ucar}, title = {{Improving Antibody Humanness Prediction using Patent Data}}, howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}}, month = January, year = {since 2024} }

Owner

  • Name: AstraZeneca
  • Login: AstraZeneca
  • Kind: organization
  • Location: Global

Data and AI: Unlocking new science insights

GitHub Events

Total
  • Watch event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Fork event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 20
  • Total Committers: 1
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 20
  • Committers: 1
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Talip Uçar t****r@g****m 20

Dependencies

Pipfile pypi
  • absl-py ==1.4.0
  • aiohttp ==3.9.1
  • aiosignal ==1.3.1
  • anndata ==0.8.0
  • arboreto ==0.1.6
  • async-timeout ==4.0.3
  • attrs ==22.1.0
  • autoflake ==2.1.1
  • autopage ==0.5.1
  • bio ==1.5.6
  • biopython ==1.81
  • biothings-client ==0.2.6
  • biotite ==0.37.0
  • black ==22.3.0
  • bokeh ==2.4.3
  • cachetools ==5.3.0
  • causal-learn ==0.1.3.3
  • causalbench ==1.0.0
  • causaldag ==0.1a163
  • cdt ==0.6.0
  • certifi ==2022.6.15
  • cfgv ==3.3.1
  • charset-normalizer ==2.1.0
  • class-resolver ==0.3.10
  • click ==8.0.4
  • click-default-group ==1.2.2
  • cliff ==3.10.1
  • cloudpickle ==2.1.0
  • cmaes ==0.8.2
  • cmake ==3.24.0
  • cmd2 ==2.4.2
  • colorlog ==6.6.0
  • conditional-independence ==0.1a6
  • cycler ==0.11.0
  • cython ==0.29.32
  • dask ==2023.3.1
  • databricks-cli ==0.17.1
  • dataclasses ==0.6
  • dataclasses-json ==0.5.7
  • datatable ==1.0.0
  • decorator ==5.1.1
  • deprecated ==1.2.13
  • distlib ==0.3.6
  • distributed ==2023.3.1
  • docdata ==0.0.3
  • docker ==5.0.3
  • easydict ==1.10
  • einops ==0.4.1
  • et-xmlfile ==1.1.0
  • filelock ==3.8.0
  • flask ==2.2.2
  • fonttools ==4.34.4
  • frozendict ==2.3.6
  • frozenlist ==1.3.3
  • fsspec ==2023.3.0
  • future ==0.18.3
  • gdown ==4.6.6
  • gies ==0.0.1
  • google-auth ==2.18.1
  • google-auth-oauthlib ==1.0.0
  • gprofiler-official ==1.0.0
  • gputil ==1.4.0
  • graphical-model-learning ==0.1a8
  • graphical-models ==0.1a19
  • graphtools ==1.5.3
  • graphviz ==0.20.1
  • grpcio ==1.50.0
  • gunicorn ==20.1.0
  • h5py ==3.8.0
  • heapdict ==1.0.1
  • identify ==2.5.21
  • idna ==3.3
  • igraph ==0.10.4
  • ilock ==1.0.3
  • ipdb ==0.13.13
  • isort ==5.10.1
  • itsdangerous ==2.1.2
  • joblib ==1.1.0
  • jsonschema ==4.17.0
  • kiwisolver ==1.4.4
  • lazypredict ==0.2.12
  • lightgbm ==3.3.3
  • lightning-utilities ==0.10.0
  • littleballoffur ==2.1.12
  • littleutils ==0.2.2
  • llvmlite ==0.39.1
  • lmdb ==1.4.1
  • locket ==1.0.0
  • lz4 ==4.3.2
  • magic-impute ==3.0.0
  • markdown ==3.4.3
  • marshmallow ==3.17.0
  • marshmallow-enum ==1.5.1
  • matplotlib ==3.5.2
  • mlflow ==1.28.0
  • modal ==0.4.1
  • more-click ==0.1.1
  • more-itertools ==8.14.0
  • msgpack ==1.0.4
  • multidict ==6.0.4
  • mygene ==3.2.2
  • mypy-extensions ==0.4.3
  • natsort ==8.3.1
  • networkit ==7.1
  • networkx ==2.8.5
  • nodeenv ==1.7.0
  • numba ==0.56.4
  • numexpr ==2.8.4
  • numpy ==1.23.1
  • ogb ==1.3.6
  • openpyxl ==3.1.2
  • opt-einsum ==3.3.0
  • optuna ==2.10.1
  • outdated ==0.2.2
  • pandas ==1.3.5
  • partd ==1.3.0
  • patsy ==0.5.3
  • pbr ==5.9.0
  • pexpect ==4.8.0
  • pgmpy ==0.1.21
  • pillow ==9.2.0
  • platformdirs ==2.5.4
  • plotly ==5.12.0
  • pooch ==1.7.0
  • portalocker ==2.7.0
  • pre-commit ==2.19.0
  • pre-commit-hooks ==4.2.0
  • prettytable ==3.3.0
  • progressbar2 ==4.2.0
  • prometheus-flask-exporter ==0.20.3
  • protobuf ==3.20.1
  • psutil ==5.9.4
  • ptyprocess ==0.7.0
  • pyarrow ==11.0.0
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pydot ==1.4.2
  • pyflakes ==3.0.1
  • pygam ==0.8.0
  • pygments ==2.14.0
  • pygsp ==0.5.1
  • pykeen ==1.9.0
  • pynndescent ==0.5.8
  • pyparsing ==3.0.9
  • pyperclip ==1.8.2
  • pyrsistent ==0.19.2
  • pysocks ==1.7.1
  • pystow ==0.4.6
  • python-igraph ==0.10.4
  • python-louvain ==0.16
  • python-utils ==3.5.2
  • pytorch-lightning ==2.0.0
  • pytz ==2022.1
  • querystring-parser ==1.2.4
  • ray ==2.4.0
  • rdkit-pypi ==2022.3.4
  • requests ==2.28.1
  • requests-oauthlib ==1.3.1
  • rexmex ==0.1.0
  • rsa ==4.9
  • scanpy ==1.9.3
  • scikit-learn ==1.1.2
  • scikit-multilearn ==0.2.0
  • scipy ==1.9.0
  • scprep ==1.2.2
  • seaborn ==0.11.2
  • session-info ==1.0.0
  • six ==1.16.0
  • sklearn ==0.0
  • skorch ==0.12.0
  • skrebate ==0.62
  • slingpy ==0.2.12
  • sortedcontainers ==2.4.0
  • sqlparse ==0.4.2
  • stdlib-list ==0.8.0
  • stevedore ==4.0.0
  • tabulate ==0.8.10
  • tasklogger ==1.2.0
  • tblib ==1.7.0
  • tensorboard ==2.13.0
  • tensorboard-data-server ==0.7.0
  • texttable ==1.6.4
  • threadpoolctl ==3.1.0
  • tokenizers ==0.13.2
  • toml ==0.10.2
  • toolz ==0.12.0
  • torch ==1.11.0+cu113
  • torch-cluster ==1.6.0
  • torch-geometric ==2.0.4
  • torch-max-mem ==0.0.2
  • torch-ppr ==0.0.8
  • torch-scatter ==2.0.9
  • torch-sparse ==0.6.14
  • torchaudio ==0.11.0+cu113
  • torcheval ==0.0.7
  • torchmetrics ==1.3.0
  • torchvision ==0.12.0+cu113
  • tqdm ==4.66.1
  • typing ==3.7.4.3
  • typing-extensions ==4.3.0
  • typing-inspect ==0.7.1
  • umap-learn ==0.5.3
  • urllib3 ==1.26.11
  • virtualenv ==20.16.7
  • wcwidth ==0.2.5
  • werkzeug ==2.2.2
  • wrapt ==1.15.0
  • xgboost ==1.7.2
  • xxhash ==3.2.0
  • yarl ==1.9.4
  • zict ==2.2.0
  • zipp ==3.8.1
requirements.txt pypi
  • Cython ==0.29.32
  • Deprecated ==1.2.13
  • Flask ==2.2.2
  • GPUtil ==1.4.0
  • HeapDict ==1.0.1
  • Markdown ==3.4.3
  • Pillow ==9.2.0
  • PyGSP ==0.5.1
  • PySocks ==1.7.1
  • Pygments ==2.14.0
  • Werkzeug ==2.2.2
  • absl-py ==1.4.0
  • aiohttp ==3.9.1
  • aiosignal ==1.3.1
  • anndata ==0.8.0
  • arboreto ==0.1.6
  • async-timeout ==4.0.3
  • attrs ==22.1.0
  • autoflake ==2.1.1
  • autopage ==0.5.1
  • bio ==1.5.6
  • biopython ==1.81
  • biothings-client ==0.2.6
  • biotite ==0.37.0
  • black ==22.3.0
  • bokeh ==2.4.3
  • cachetools ==5.3.0
  • causal-learn ==0.1.3.3
  • causalbench ==1.0.0
  • causaldag ==0.1a163
  • cdt ==0.6.0
  • certifi ==2022.6.15
  • cfgv ==3.3.1
  • charset-normalizer ==2.1.0
  • class-resolver ==0.3.10
  • click ==8.0.4
  • click-default-group ==1.2.2
  • cliff ==3.10.1
  • cloudpickle ==2.1.0
  • cmaes ==0.8.2
  • cmake ==3.24.0
  • cmd2 ==2.4.2
  • colorlog ==6.6.0
  • conditional-independence ==0.1a6
  • cycler ==0.11.0
  • dask ==2023.3.1
  • databricks-cli ==0.17.1
  • dataclasses ==0.6
  • dataclasses-json ==0.5.7
  • datatable ==1.0.0
  • decorator ==5.1.1
  • distlib ==0.3.6
  • distributed ==2023.3.1
  • docdata ==0.0.3
  • docker ==5.0.3
  • easydict ==1.10
  • einops ==0.4.1
  • et-xmlfile ==1.1.0
  • filelock ==3.8.0
  • fonttools ==4.34.4
  • frozendict ==2.3.6
  • frozenlist ==1.3.3
  • fsspec ==2023.3.0
  • future ==0.18.3
  • gdown ==4.6.6
  • gies ==0.0.1
  • google-auth ==2.18.1
  • google-auth-oauthlib ==1.0.0
  • gprofiler-official ==1.0.0
  • graphical-model-learning ==0.1a8
  • graphical-models ==0.1a19
  • graphtools ==1.5.3
  • graphviz ==0.20.1
  • grpcio ==1.50.0
  • gunicorn ==20.1.0
  • h5py ==3.8.0
  • identify ==2.5.21
  • idna ==3.3
  • igraph ==0.10.4
  • ilock ==1.0.3
  • ipdb ==0.13.13
  • isort ==5.10.1
  • itsdangerous ==2.1.2
  • joblib ==1.1.0
  • jsonschema ==4.17.0
  • kiwisolver ==1.4.4
  • lazypredict ==0.2.12
  • lightgbm ==3.3.3
  • lightning-utilities ==0.10.0
  • littleballoffur ==2.1.12
  • littleutils ==0.2.2
  • llvmlite ==0.39.1
  • lmdb ==1.4.1
  • locket ==1.0.0
  • lz4 ==4.3.2
  • magic-impute ==3.0.0
  • marshmallow ==3.17.0
  • marshmallow-enum ==1.5.1
  • matplotlib ==3.5.2
  • mlflow ==1.28.0
  • modAL ==0.4.1
  • more-click ==0.1.1
  • more-itertools ==8.14.0
  • msgpack ==1.0.4
  • multidict ==6.0.4
  • mygene ==3.2.2
  • mypy-extensions ==0.4.3
  • natsort ==8.3.1
  • networkit ==7.1
  • networkx ==2.8.5
  • nodeenv ==1.7.0
  • numba ==0.56.4
  • numexpr ==2.8.4
  • numpy ==1.23.1
  • ogb ==1.3.6
  • openpyxl ==3.1.2
  • opt-einsum ==3.3.0
  • optuna ==2.10.1
  • outdated ==0.2.2
  • packaging ==21.3
  • pandas ==1.3.5
  • partd ==1.3.0
  • patsy ==0.5.3
  • pbr ==5.9.0
  • pexpect ==4.8.0
  • pgmpy ==0.1.21
  • platformdirs ==2.5.4
  • plotly ==5.12.0
  • pooch ==1.7.0
  • portalocker ==2.7.0
  • pre-commit ==2.19.0
  • pre-commit-hooks ==4.2.0
  • prettytable ==3.3.0
  • progressbar2 ==4.2.0
  • prometheus-flask-exporter ==0.20.3
  • protobuf ==3.20.1
  • psutil ==5.9.4
  • ptyprocess ==0.7.0
  • pyarrow ==11.0.0
  • pyasn1 ==0.5.0
  • pyasn1-modules ==0.3.0
  • pydot ==1.4.2
  • pyflakes ==3.0.1
  • pygam ==0.8.0
  • pykeen ==1.9.0
  • pynndescent ==0.5.8
  • pyparsing ==3.0.9
  • pyperclip ==1.8.2
  • pyrsistent ==0.19.2
  • pystow ==0.4.6
  • python-igraph ==0.10.4
  • python-louvain ==0.16
  • python-utils ==3.5.2
  • pytorch-lightning ==2.0.0
  • pytz ==2022.1
  • querystring-parser ==1.2.4
  • ray ==2.4.0
  • rdkit-pypi ==2022.3.4
  • requests ==2.28.1
  • requests-oauthlib ==1.3.1
  • rexmex ==0.1.0
  • rsa ==4.9
  • scanpy ==1.9.3
  • scikit-learn ==1.1.2
  • scikit-multilearn ==0.2.0
  • scipy ==1.9.0
  • scprep ==1.2.2
  • seaborn ==0.11.2
  • session-info ==1.0.0
  • six ==1.16.0
  • sklearn ==0.0
  • skorch ==0.12.0
  • skrebate ==0.62
  • slingpy ==0.2.12
  • sortedcontainers ==2.4.0
  • sqlparse ==0.4.2
  • stdlib-list ==0.8.0
  • stevedore ==4.0.0
  • tabulate ==0.8.10
  • tasklogger ==1.2.0
  • tblib ==1.7.0
  • tensorboard ==2.13.0
  • tensorboard-data-server ==0.7.0
  • texttable ==1.6.4
  • threadpoolctl ==3.1.0
  • tokenizers ==0.13.2
  • toml ==0.10.2
  • toolz ==0.12.0
  • torch ==1.11.0
  • torch-cluster ==1.6.0
  • torch-geometric ==2.0.4
  • torch-max-mem ==0.0.2
  • torch-ppr ==0.0.8
  • torch-scatter ==2.0.9
  • torch-sparse ==0.6.14
  • torchaudio ==0.11.0
  • torcheval ==0.0.7
  • torchmetrics ==1.3.0
  • torchvision ==0.12.0
  • tqdm ==4.66.1
  • typing ==3.7.4.3
  • typing-inspect ==0.7.1
  • typing_extensions ==4.3.0
  • umap-learn ==0.5.3
  • urllib3 ==1.26.11
  • virtualenv ==20.16.7
  • wcwidth ==0.2.5
  • wrapt ==1.15.0
  • xgboost ==1.7.2
  • xxhash ==3.2.0
  • yarl ==1.9.4
  • zict ==2.2.0
  • zipp ==3.8.1