https://github.com/astrazeneca/selfpad
The official implementation of "Improving Antibody Humanness Prediction using Patent Data".
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Keywords
Repository
The official implementation of "Improving Antibody Humanness Prediction using Patent Data".
Basic Info
Statistics
- Stars: 12
- Watchers: 3
- Forks: 5
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
SelfPAD:
Author: Talip Ucar (ucabtuc@gmail.com)
The official implementation of Improving Antibody Humanness Prediction using Patent Data
Table of Contents:
- Model
- Environment
- Configuration
- Training and Evaluation
- Structure of the repo
- Results
- Experiment tracking
- Citing the paper
- Citing this repo
Model
Pre-training | Fine-tuning
:-------------------------:|:-------------------------:
| 
Environment
We used Python 3.7 for our experiments. The environment can be set up by following three steps:
pip install pipenv # To install pipenv if you don't have it already
pipenv install --skip-lock # To install required packages.
pipenv shell # To activate virtual env
If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".
Configuration
There are two types of configuration files:
1. pad.yaml # Defines parameters and options for pre-training
2. humanness.yaml # Defines parameters and options for fine-training
Training and Evaluation
You can train and evaluate the model by using:
python selfpad_pretrain.py # For pre-training
python selfpad_finetune.py # For fine-tuning it for humanness
python selfpad_eval.py -ev test # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns
Structure of the repo
- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py
- src
|-selfpad.py
|-selfpad_humanness.py
- config
|-pad.yaml
|-humanness.yaml
- utils_common
|-arguments.py
|-utils.py
|-tokenizer.py
...
- utils_pretrain
|-load_data.py
|-model_utils.py
|-loss_functions.py
...
- utils_finetune
|-load_data.py
|-model_utils.py
|-loss_functions.py
...
- data
|-test.csv
...
- results
|-pretraining
|-humanness
...
Results
Results at the end of training is saved under ./results directory. Results directory structure is as following:
- results
|-task e.g. humanness, or pretraining
|-evaluation
|-clusters (for plotting t-SNE and PCA plots of embeddings)
|-training
|-model
|-plots
|-loss
You can save results of evaluations under "evaluation" folder.
Experiment tracking
You can turn on Weight and Biases (W&B) in the config file for logging
Citing the paper
@article{ucar2024SelfPAD,
title={Improving Antibody Humanness Prediction using Patent Data},
author={Ucar, Talip and
Ramon, Aubin and
Oglic, Dino and
Croasdale-Wood, Rebecca and
Diethe, Tom and
Sormanni, Pietro},
journal={arXiv preprint arXiv:2110.04361},
year={2024}
}
Citing this repo
If you use SelfPAD framework in your own studies, and work, please cite it by using the following:
@Misc{talip_ucar_2024_SelfPAD,
author = {Talip Ucar},
title = {{Improving Antibody Humanness Prediction using Patent Data}},
howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
month = January,
year = {since 2024}
}
Owner
- Name: AstraZeneca
- Login: AstraZeneca
- Kind: organization
- Location: Global
- Website: https://www.astrazeneca.com/
- Repositories: 33
- Profile: https://github.com/AstraZeneca
Data and AI: Unlocking new science insights
GitHub Events
Total
- Watch event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Fork event: 1
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Talip Uçar | t****r@g****m | 20 |
Dependencies
- absl-py ==1.4.0
- aiohttp ==3.9.1
- aiosignal ==1.3.1
- anndata ==0.8.0
- arboreto ==0.1.6
- async-timeout ==4.0.3
- attrs ==22.1.0
- autoflake ==2.1.1
- autopage ==0.5.1
- bio ==1.5.6
- biopython ==1.81
- biothings-client ==0.2.6
- biotite ==0.37.0
- black ==22.3.0
- bokeh ==2.4.3
- cachetools ==5.3.0
- causal-learn ==0.1.3.3
- causalbench ==1.0.0
- causaldag ==0.1a163
- cdt ==0.6.0
- certifi ==2022.6.15
- cfgv ==3.3.1
- charset-normalizer ==2.1.0
- class-resolver ==0.3.10
- click ==8.0.4
- click-default-group ==1.2.2
- cliff ==3.10.1
- cloudpickle ==2.1.0
- cmaes ==0.8.2
- cmake ==3.24.0
- cmd2 ==2.4.2
- colorlog ==6.6.0
- conditional-independence ==0.1a6
- cycler ==0.11.0
- cython ==0.29.32
- dask ==2023.3.1
- databricks-cli ==0.17.1
- dataclasses ==0.6
- dataclasses-json ==0.5.7
- datatable ==1.0.0
- decorator ==5.1.1
- deprecated ==1.2.13
- distlib ==0.3.6
- distributed ==2023.3.1
- docdata ==0.0.3
- docker ==5.0.3
- easydict ==1.10
- einops ==0.4.1
- et-xmlfile ==1.1.0
- filelock ==3.8.0
- flask ==2.2.2
- fonttools ==4.34.4
- frozendict ==2.3.6
- frozenlist ==1.3.3
- fsspec ==2023.3.0
- future ==0.18.3
- gdown ==4.6.6
- gies ==0.0.1
- google-auth ==2.18.1
- google-auth-oauthlib ==1.0.0
- gprofiler-official ==1.0.0
- gputil ==1.4.0
- graphical-model-learning ==0.1a8
- graphical-models ==0.1a19
- graphtools ==1.5.3
- graphviz ==0.20.1
- grpcio ==1.50.0
- gunicorn ==20.1.0
- h5py ==3.8.0
- heapdict ==1.0.1
- identify ==2.5.21
- idna ==3.3
- igraph ==0.10.4
- ilock ==1.0.3
- ipdb ==0.13.13
- isort ==5.10.1
- itsdangerous ==2.1.2
- joblib ==1.1.0
- jsonschema ==4.17.0
- kiwisolver ==1.4.4
- lazypredict ==0.2.12
- lightgbm ==3.3.3
- lightning-utilities ==0.10.0
- littleballoffur ==2.1.12
- littleutils ==0.2.2
- llvmlite ==0.39.1
- lmdb ==1.4.1
- locket ==1.0.0
- lz4 ==4.3.2
- magic-impute ==3.0.0
- markdown ==3.4.3
- marshmallow ==3.17.0
- marshmallow-enum ==1.5.1
- matplotlib ==3.5.2
- mlflow ==1.28.0
- modal ==0.4.1
- more-click ==0.1.1
- more-itertools ==8.14.0
- msgpack ==1.0.4
- multidict ==6.0.4
- mygene ==3.2.2
- mypy-extensions ==0.4.3
- natsort ==8.3.1
- networkit ==7.1
- networkx ==2.8.5
- nodeenv ==1.7.0
- numba ==0.56.4
- numexpr ==2.8.4
- numpy ==1.23.1
- ogb ==1.3.6
- openpyxl ==3.1.2
- opt-einsum ==3.3.0
- optuna ==2.10.1
- outdated ==0.2.2
- pandas ==1.3.5
- partd ==1.3.0
- patsy ==0.5.3
- pbr ==5.9.0
- pexpect ==4.8.0
- pgmpy ==0.1.21
- pillow ==9.2.0
- platformdirs ==2.5.4
- plotly ==5.12.0
- pooch ==1.7.0
- portalocker ==2.7.0
- pre-commit ==2.19.0
- pre-commit-hooks ==4.2.0
- prettytable ==3.3.0
- progressbar2 ==4.2.0
- prometheus-flask-exporter ==0.20.3
- protobuf ==3.20.1
- psutil ==5.9.4
- ptyprocess ==0.7.0
- pyarrow ==11.0.0
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pydot ==1.4.2
- pyflakes ==3.0.1
- pygam ==0.8.0
- pygments ==2.14.0
- pygsp ==0.5.1
- pykeen ==1.9.0
- pynndescent ==0.5.8
- pyparsing ==3.0.9
- pyperclip ==1.8.2
- pyrsistent ==0.19.2
- pysocks ==1.7.1
- pystow ==0.4.6
- python-igraph ==0.10.4
- python-louvain ==0.16
- python-utils ==3.5.2
- pytorch-lightning ==2.0.0
- pytz ==2022.1
- querystring-parser ==1.2.4
- ray ==2.4.0
- rdkit-pypi ==2022.3.4
- requests ==2.28.1
- requests-oauthlib ==1.3.1
- rexmex ==0.1.0
- rsa ==4.9
- scanpy ==1.9.3
- scikit-learn ==1.1.2
- scikit-multilearn ==0.2.0
- scipy ==1.9.0
- scprep ==1.2.2
- seaborn ==0.11.2
- session-info ==1.0.0
- six ==1.16.0
- sklearn ==0.0
- skorch ==0.12.0
- skrebate ==0.62
- slingpy ==0.2.12
- sortedcontainers ==2.4.0
- sqlparse ==0.4.2
- stdlib-list ==0.8.0
- stevedore ==4.0.0
- tabulate ==0.8.10
- tasklogger ==1.2.0
- tblib ==1.7.0
- tensorboard ==2.13.0
- tensorboard-data-server ==0.7.0
- texttable ==1.6.4
- threadpoolctl ==3.1.0
- tokenizers ==0.13.2
- toml ==0.10.2
- toolz ==0.12.0
- torch ==1.11.0+cu113
- torch-cluster ==1.6.0
- torch-geometric ==2.0.4
- torch-max-mem ==0.0.2
- torch-ppr ==0.0.8
- torch-scatter ==2.0.9
- torch-sparse ==0.6.14
- torchaudio ==0.11.0+cu113
- torcheval ==0.0.7
- torchmetrics ==1.3.0
- torchvision ==0.12.0+cu113
- tqdm ==4.66.1
- typing ==3.7.4.3
- typing-extensions ==4.3.0
- typing-inspect ==0.7.1
- umap-learn ==0.5.3
- urllib3 ==1.26.11
- virtualenv ==20.16.7
- wcwidth ==0.2.5
- werkzeug ==2.2.2
- wrapt ==1.15.0
- xgboost ==1.7.2
- xxhash ==3.2.0
- yarl ==1.9.4
- zict ==2.2.0
- zipp ==3.8.1
- Cython ==0.29.32
- Deprecated ==1.2.13
- Flask ==2.2.2
- GPUtil ==1.4.0
- HeapDict ==1.0.1
- Markdown ==3.4.3
- Pillow ==9.2.0
- PyGSP ==0.5.1
- PySocks ==1.7.1
- Pygments ==2.14.0
- Werkzeug ==2.2.2
- absl-py ==1.4.0
- aiohttp ==3.9.1
- aiosignal ==1.3.1
- anndata ==0.8.0
- arboreto ==0.1.6
- async-timeout ==4.0.3
- attrs ==22.1.0
- autoflake ==2.1.1
- autopage ==0.5.1
- bio ==1.5.6
- biopython ==1.81
- biothings-client ==0.2.6
- biotite ==0.37.0
- black ==22.3.0
- bokeh ==2.4.3
- cachetools ==5.3.0
- causal-learn ==0.1.3.3
- causalbench ==1.0.0
- causaldag ==0.1a163
- cdt ==0.6.0
- certifi ==2022.6.15
- cfgv ==3.3.1
- charset-normalizer ==2.1.0
- class-resolver ==0.3.10
- click ==8.0.4
- click-default-group ==1.2.2
- cliff ==3.10.1
- cloudpickle ==2.1.0
- cmaes ==0.8.2
- cmake ==3.24.0
- cmd2 ==2.4.2
- colorlog ==6.6.0
- conditional-independence ==0.1a6
- cycler ==0.11.0
- dask ==2023.3.1
- databricks-cli ==0.17.1
- dataclasses ==0.6
- dataclasses-json ==0.5.7
- datatable ==1.0.0
- decorator ==5.1.1
- distlib ==0.3.6
- distributed ==2023.3.1
- docdata ==0.0.3
- docker ==5.0.3
- easydict ==1.10
- einops ==0.4.1
- et-xmlfile ==1.1.0
- filelock ==3.8.0
- fonttools ==4.34.4
- frozendict ==2.3.6
- frozenlist ==1.3.3
- fsspec ==2023.3.0
- future ==0.18.3
- gdown ==4.6.6
- gies ==0.0.1
- google-auth ==2.18.1
- google-auth-oauthlib ==1.0.0
- gprofiler-official ==1.0.0
- graphical-model-learning ==0.1a8
- graphical-models ==0.1a19
- graphtools ==1.5.3
- graphviz ==0.20.1
- grpcio ==1.50.0
- gunicorn ==20.1.0
- h5py ==3.8.0
- identify ==2.5.21
- idna ==3.3
- igraph ==0.10.4
- ilock ==1.0.3
- ipdb ==0.13.13
- isort ==5.10.1
- itsdangerous ==2.1.2
- joblib ==1.1.0
- jsonschema ==4.17.0
- kiwisolver ==1.4.4
- lazypredict ==0.2.12
- lightgbm ==3.3.3
- lightning-utilities ==0.10.0
- littleballoffur ==2.1.12
- littleutils ==0.2.2
- llvmlite ==0.39.1
- lmdb ==1.4.1
- locket ==1.0.0
- lz4 ==4.3.2
- magic-impute ==3.0.0
- marshmallow ==3.17.0
- marshmallow-enum ==1.5.1
- matplotlib ==3.5.2
- mlflow ==1.28.0
- modAL ==0.4.1
- more-click ==0.1.1
- more-itertools ==8.14.0
- msgpack ==1.0.4
- multidict ==6.0.4
- mygene ==3.2.2
- mypy-extensions ==0.4.3
- natsort ==8.3.1
- networkit ==7.1
- networkx ==2.8.5
- nodeenv ==1.7.0
- numba ==0.56.4
- numexpr ==2.8.4
- numpy ==1.23.1
- ogb ==1.3.6
- openpyxl ==3.1.2
- opt-einsum ==3.3.0
- optuna ==2.10.1
- outdated ==0.2.2
- packaging ==21.3
- pandas ==1.3.5
- partd ==1.3.0
- patsy ==0.5.3
- pbr ==5.9.0
- pexpect ==4.8.0
- pgmpy ==0.1.21
- platformdirs ==2.5.4
- plotly ==5.12.0
- pooch ==1.7.0
- portalocker ==2.7.0
- pre-commit ==2.19.0
- pre-commit-hooks ==4.2.0
- prettytable ==3.3.0
- progressbar2 ==4.2.0
- prometheus-flask-exporter ==0.20.3
- protobuf ==3.20.1
- psutil ==5.9.4
- ptyprocess ==0.7.0
- pyarrow ==11.0.0
- pyasn1 ==0.5.0
- pyasn1-modules ==0.3.0
- pydot ==1.4.2
- pyflakes ==3.0.1
- pygam ==0.8.0
- pykeen ==1.9.0
- pynndescent ==0.5.8
- pyparsing ==3.0.9
- pyperclip ==1.8.2
- pyrsistent ==0.19.2
- pystow ==0.4.6
- python-igraph ==0.10.4
- python-louvain ==0.16
- python-utils ==3.5.2
- pytorch-lightning ==2.0.0
- pytz ==2022.1
- querystring-parser ==1.2.4
- ray ==2.4.0
- rdkit-pypi ==2022.3.4
- requests ==2.28.1
- requests-oauthlib ==1.3.1
- rexmex ==0.1.0
- rsa ==4.9
- scanpy ==1.9.3
- scikit-learn ==1.1.2
- scikit-multilearn ==0.2.0
- scipy ==1.9.0
- scprep ==1.2.2
- seaborn ==0.11.2
- session-info ==1.0.0
- six ==1.16.0
- sklearn ==0.0
- skorch ==0.12.0
- skrebate ==0.62
- slingpy ==0.2.12
- sortedcontainers ==2.4.0
- sqlparse ==0.4.2
- stdlib-list ==0.8.0
- stevedore ==4.0.0
- tabulate ==0.8.10
- tasklogger ==1.2.0
- tblib ==1.7.0
- tensorboard ==2.13.0
- tensorboard-data-server ==0.7.0
- texttable ==1.6.4
- threadpoolctl ==3.1.0
- tokenizers ==0.13.2
- toml ==0.10.2
- toolz ==0.12.0
- torch ==1.11.0
- torch-cluster ==1.6.0
- torch-geometric ==2.0.4
- torch-max-mem ==0.0.2
- torch-ppr ==0.0.8
- torch-scatter ==2.0.9
- torch-sparse ==0.6.14
- torchaudio ==0.11.0
- torcheval ==0.0.7
- torchmetrics ==1.3.0
- torchvision ==0.12.0
- tqdm ==4.66.1
- typing ==3.7.4.3
- typing-inspect ==0.7.1
- typing_extensions ==4.3.0
- umap-learn ==0.5.3
- urllib3 ==1.26.11
- virtualenv ==20.16.7
- wcwidth ==0.2.5
- wrapt ==1.15.0
- xgboost ==1.7.2
- xxhash ==3.2.0
- yarl ==1.9.4
- zict ==2.2.0
- zipp ==3.8.1