https://github.com/astrazeneca/selfpad

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer

Last synced: 11 months ago · JSON representation

Repository

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".

Basic Info

Host: GitHub
Owner: AstraZeneca
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 425 KB

Statistics

Stars: 12
Watchers: 3
Forks: 5
Open Issues: 0
Releases: 0

Topics

antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

SelfPAD:

Author: Talip Ucar (ucabtuc@gmail.com)

The official implementation of Improving Antibody Humanness Prediction using Patent Data

Model

Pre-training | Fine-tuning :-------------------------:|:-------------------------: SelfPAD | SelfPAD

Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

pip install pipenv # To install pipenv if you don't have it already pipenv install --skip-lock # To install required packages. pipenv shell # To activate virtual env

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".

Configuration

There are two types of configuration files: 1. pad.yaml # Defines parameters and options for pre-training 2. humanness.yaml # Defines parameters and options for fine-training

Training and Evaluation

You can train and evaluate the model by using:

python selfpad_pretrain.py # For pre-training python selfpad_finetune.py # For fine-tuning it for humanness python selfpad_eval.py -ev test # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

Structure of the repo

- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py

- src
    |-selfpad.py
    |-selfpad_humanness.py

- config
    |-pad.yaml
    |-humanness.yaml
    
- utils_common
    |-arguments.py
    |-utils.py
    |-tokenizer.py
    ...
    
- utils_pretrain
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- utils_finetune
    |-load_data.py
    |-model_utils.py
    |-loss_functions.py
    ...
    
- data
    |-test.csv
    ...
    
- results
    |-pretraining
    |-humanness
    ...

Results

Results at the end of training is saved under ./results directory. Results directory structure is as following:

- results
    |-task e.g. humanness, or pretraining
            |-evaluation
                |-clusters (for plotting t-SNE and PCA plots of embeddings)
            |-training
                |-model
                |-plots
                |-loss

You can save results of evaluations under "evaluation" folder.

Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging

Citing the paper

@article{ucar2024SelfPAD, title={Improving Antibody Humanness Prediction using Patent Data}, author={Ucar, Talip and Ramon, Aubin and Oglic, Dino and Croasdale-Wood, Rebecca and Diethe, Tom and Sormanni, Pietro}, journal={arXiv preprint arXiv:2110.04361}, year={2024} }

Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

@Misc{talip_ucar_2024_SelfPAD, author = {Talip Ucar}, title = {{Improving Antibody Humanness Prediction using Patent Data}}, howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}}, month = January, year = {since 2024} }

Owner

Name: AstraZeneca
Login: AstraZeneca
Kind: organization
Location: Global

Website: https://www.astrazeneca.com/
Repositories: 33
Profile: https://github.com/AstraZeneca

Data and AI: Unlocking new science insights

GitHub Events

Total

Watch event: 1
Fork event: 1

Last Year

Watch event: 1
Fork event: 1

Committers

Last synced: about 2 years ago

All Time

Total Commits: 20
Total Committers: 1
Avg Commits per committer: 20.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 20
Committers: 1
Avg Commits per committer: 20.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Talip Uçar	t**r@g**m	20

Dependencies

Pipfile pypi

absl-py ==1.4.0
aiohttp ==3.9.1
aiosignal ==1.3.1
anndata ==0.8.0
arboreto ==0.1.6
async-timeout ==4.0.3
attrs ==22.1.0
autoflake ==2.1.1
autopage ==0.5.1
bio ==1.5.6
biopython ==1.81
biothings-client ==0.2.6
biotite ==0.37.0
black ==22.3.0
bokeh ==2.4.3
cachetools ==5.3.0
causal-learn ==0.1.3.3
causalbench ==1.0.0
causaldag ==0.1a163
cdt ==0.6.0
certifi ==2022.6.15
cfgv ==3.3.1
charset-normalizer ==2.1.0
class-resolver ==0.3.10
click ==8.0.4
click-default-group ==1.2.2
cliff ==3.10.1
cloudpickle ==2.1.0
cmaes ==0.8.2
cmake ==3.24.0
cmd2 ==2.4.2
colorlog ==6.6.0
conditional-independence ==0.1a6
cycler ==0.11.0
cython ==0.29.32
dask ==2023.3.1
databricks-cli ==0.17.1
dataclasses ==0.6
dataclasses-json ==0.5.7
datatable ==1.0.0
decorator ==5.1.1
deprecated ==1.2.13
distlib ==0.3.6
distributed ==2023.3.1
docdata ==0.0.3
docker ==5.0.3
easydict ==1.10
einops ==0.4.1
et-xmlfile ==1.1.0
filelock ==3.8.0
flask ==2.2.2
fonttools ==4.34.4
frozendict ==2.3.6
frozenlist ==1.3.3
fsspec ==2023.3.0
future ==0.18.3
gdown ==4.6.6
gies ==0.0.1
google-auth ==2.18.1
google-auth-oauthlib ==1.0.0
gprofiler-official ==1.0.0
gputil ==1.4.0
graphical-model-learning ==0.1a8
graphical-models ==0.1a19
graphtools ==1.5.3
graphviz ==0.20.1
grpcio ==1.50.0
gunicorn ==20.1.0
h5py ==3.8.0
heapdict ==1.0.1
identify ==2.5.21
idna ==3.3
igraph ==0.10.4
ilock ==1.0.3
ipdb ==0.13.13
isort ==5.10.1
itsdangerous ==2.1.2
joblib ==1.1.0
jsonschema ==4.17.0
kiwisolver ==1.4.4
lazypredict ==0.2.12
lightgbm ==3.3.3
lightning-utilities ==0.10.0
littleballoffur ==2.1.12
littleutils ==0.2.2
llvmlite ==0.39.1
lmdb ==1.4.1
locket ==1.0.0
lz4 ==4.3.2
magic-impute ==3.0.0
markdown ==3.4.3
marshmallow ==3.17.0
marshmallow-enum ==1.5.1
matplotlib ==3.5.2
mlflow ==1.28.0
modal ==0.4.1
more-click ==0.1.1
more-itertools ==8.14.0
msgpack ==1.0.4
multidict ==6.0.4
mygene ==3.2.2
mypy-extensions ==0.4.3
natsort ==8.3.1
networkit ==7.1
networkx ==2.8.5
nodeenv ==1.7.0
numba ==0.56.4
numexpr ==2.8.4
numpy ==1.23.1
ogb ==1.3.6
openpyxl ==3.1.2
opt-einsum ==3.3.0
optuna ==2.10.1
outdated ==0.2.2
pandas ==1.3.5
partd ==1.3.0
patsy ==0.5.3
pbr ==5.9.0
pexpect ==4.8.0
pgmpy ==0.1.21
pillow ==9.2.0
platformdirs ==2.5.4
plotly ==5.12.0
pooch ==1.7.0
portalocker ==2.7.0
pre-commit ==2.19.0
pre-commit-hooks ==4.2.0
prettytable ==3.3.0
progressbar2 ==4.2.0
prometheus-flask-exporter ==0.20.3
protobuf ==3.20.1
psutil ==5.9.4
ptyprocess ==0.7.0
pyarrow ==11.0.0
pyasn1 ==0.5.0
pyasn1-modules ==0.3.0
pydot ==1.4.2
pyflakes ==3.0.1
pygam ==0.8.0
pygments ==2.14.0
pygsp ==0.5.1
pykeen ==1.9.0
pynndescent ==0.5.8
pyparsing ==3.0.9
pyperclip ==1.8.2
pyrsistent ==0.19.2
pysocks ==1.7.1
pystow ==0.4.6
python-igraph ==0.10.4
python-louvain ==0.16
python-utils ==3.5.2
pytorch-lightning ==2.0.0
pytz ==2022.1
querystring-parser ==1.2.4
ray ==2.4.0
rdkit-pypi ==2022.3.4
requests ==2.28.1
requests-oauthlib ==1.3.1
rexmex ==0.1.0
rsa ==4.9
scanpy ==1.9.3
scikit-learn ==1.1.2
scikit-multilearn ==0.2.0
scipy ==1.9.0
scprep ==1.2.2
seaborn ==0.11.2
session-info ==1.0.0
six ==1.16.0
sklearn ==0.0
skorch ==0.12.0
skrebate ==0.62
slingpy ==0.2.12
sortedcontainers ==2.4.0
sqlparse ==0.4.2
stdlib-list ==0.8.0
stevedore ==4.0.0
tabulate ==0.8.10
tasklogger ==1.2.0
tblib ==1.7.0
tensorboard ==2.13.0
tensorboard-data-server ==0.7.0
texttable ==1.6.4
threadpoolctl ==3.1.0
tokenizers ==0.13.2
toml ==0.10.2
toolz ==0.12.0
torch ==1.11.0+cu113
torch-cluster ==1.6.0
torch-geometric ==2.0.4
torch-max-mem ==0.0.2
torch-ppr ==0.0.8
torch-scatter ==2.0.9
torch-sparse ==0.6.14
torchaudio ==0.11.0+cu113
torcheval ==0.0.7
torchmetrics ==1.3.0
torchvision ==0.12.0+cu113
tqdm ==4.66.1
typing ==3.7.4.3
typing-extensions ==4.3.0
typing-inspect ==0.7.1
umap-learn ==0.5.3
urllib3 ==1.26.11
virtualenv ==20.16.7
wcwidth ==0.2.5
werkzeug ==2.2.2
wrapt ==1.15.0
xgboost ==1.7.2
xxhash ==3.2.0
yarl ==1.9.4
zict ==2.2.0
zipp ==3.8.1

requirements.txt pypi

Cython ==0.29.32
Deprecated ==1.2.13
Flask ==2.2.2
GPUtil ==1.4.0
HeapDict ==1.0.1
Markdown ==3.4.3
Pillow ==9.2.0
PyGSP ==0.5.1
PySocks ==1.7.1
Pygments ==2.14.0
Werkzeug ==2.2.2
absl-py ==1.4.0
aiohttp ==3.9.1
aiosignal ==1.3.1
anndata ==0.8.0
arboreto ==0.1.6
async-timeout ==4.0.3
attrs ==22.1.0
autoflake ==2.1.1
autopage ==0.5.1
bio ==1.5.6
biopython ==1.81
biothings-client ==0.2.6
biotite ==0.37.0
black ==22.3.0
bokeh ==2.4.3
cachetools ==5.3.0
causal-learn ==0.1.3.3
causalbench ==1.0.0
causaldag ==0.1a163
cdt ==0.6.0
certifi ==2022.6.15
cfgv ==3.3.1
charset-normalizer ==2.1.0
class-resolver ==0.3.10
click ==8.0.4
click-default-group ==1.2.2
cliff ==3.10.1
cloudpickle ==2.1.0
cmaes ==0.8.2
cmake ==3.24.0
cmd2 ==2.4.2
colorlog ==6.6.0
conditional-independence ==0.1a6
cycler ==0.11.0
dask ==2023.3.1
databricks-cli ==0.17.1
dataclasses ==0.6
dataclasses-json ==0.5.7
datatable ==1.0.0
decorator ==5.1.1
distlib ==0.3.6
distributed ==2023.3.1
docdata ==0.0.3
docker ==5.0.3
easydict ==1.10
einops ==0.4.1
et-xmlfile ==1.1.0
filelock ==3.8.0
fonttools ==4.34.4
frozendict ==2.3.6
frozenlist ==1.3.3
fsspec ==2023.3.0
future ==0.18.3
gdown ==4.6.6
gies ==0.0.1
google-auth ==2.18.1
google-auth-oauthlib ==1.0.0
gprofiler-official ==1.0.0
graphical-model-learning ==0.1a8
graphical-models ==0.1a19
graphtools ==1.5.3
graphviz ==0.20.1
grpcio ==1.50.0
gunicorn ==20.1.0
h5py ==3.8.0
identify ==2.5.21
idna ==3.3
igraph ==0.10.4
ilock ==1.0.3
ipdb ==0.13.13
isort ==5.10.1
itsdangerous ==2.1.2
joblib ==1.1.0
jsonschema ==4.17.0
kiwisolver ==1.4.4
lazypredict ==0.2.12
lightgbm ==3.3.3
lightning-utilities ==0.10.0
littleballoffur ==2.1.12
littleutils ==0.2.2
llvmlite ==0.39.1
lmdb ==1.4.1
locket ==1.0.0
lz4 ==4.3.2
magic-impute ==3.0.0
marshmallow ==3.17.0
marshmallow-enum ==1.5.1
matplotlib ==3.5.2
mlflow ==1.28.0
modAL ==0.4.1
more-click ==0.1.1
more-itertools ==8.14.0
msgpack ==1.0.4
multidict ==6.0.4
mygene ==3.2.2
mypy-extensions ==0.4.3
natsort ==8.3.1
networkit ==7.1
networkx ==2.8.5
nodeenv ==1.7.0
numba ==0.56.4
numexpr ==2.8.4
numpy ==1.23.1
ogb ==1.3.6
openpyxl ==3.1.2
opt-einsum ==3.3.0
optuna ==2.10.1
outdated ==0.2.2
packaging ==21.3
pandas ==1.3.5
partd ==1.3.0
patsy ==0.5.3
pbr ==5.9.0
pexpect ==4.8.0
pgmpy ==0.1.21
platformdirs ==2.5.4
plotly ==5.12.0
pooch ==1.7.0
portalocker ==2.7.0
pre-commit ==2.19.0
pre-commit-hooks ==4.2.0
prettytable ==3.3.0
progressbar2 ==4.2.0
prometheus-flask-exporter ==0.20.3
protobuf ==3.20.1
psutil ==5.9.4
ptyprocess ==0.7.0
pyarrow ==11.0.0
pyasn1 ==0.5.0
pyasn1-modules ==0.3.0
pydot ==1.4.2
pyflakes ==3.0.1
pygam ==0.8.0
pykeen ==1.9.0
pynndescent ==0.5.8
pyparsing ==3.0.9
pyperclip ==1.8.2
pyrsistent ==0.19.2
pystow ==0.4.6
python-igraph ==0.10.4
python-louvain ==0.16
python-utils ==3.5.2
pytorch-lightning ==2.0.0
pytz ==2022.1
querystring-parser ==1.2.4
ray ==2.4.0
rdkit-pypi ==2022.3.4
requests ==2.28.1
requests-oauthlib ==1.3.1
rexmex ==0.1.0
rsa ==4.9
scanpy ==1.9.3
scikit-learn ==1.1.2
scikit-multilearn ==0.2.0
scipy ==1.9.0
scprep ==1.2.2
seaborn ==0.11.2
session-info ==1.0.0
six ==1.16.0
sklearn ==0.0
skorch ==0.12.0
skrebate ==0.62
slingpy ==0.2.12
sortedcontainers ==2.4.0
sqlparse ==0.4.2
stdlib-list ==0.8.0
stevedore ==4.0.0
tabulate ==0.8.10
tasklogger ==1.2.0
tblib ==1.7.0
tensorboard ==2.13.0
tensorboard-data-server ==0.7.0
texttable ==1.6.4
threadpoolctl ==3.1.0
tokenizers ==0.13.2
toml ==0.10.2
toolz ==0.12.0
torch ==1.11.0
torch-cluster ==1.6.0
torch-geometric ==2.0.4
torch-max-mem ==0.0.2
torch-ppr ==0.0.8
torch-scatter ==2.0.9
torch-sparse ==0.6.14
torchaudio ==0.11.0
torcheval ==0.0.7
torchmetrics ==1.3.0
torchvision ==0.12.0
tqdm ==4.66.1
typing ==3.7.4.3
typing-inspect ==0.7.1
typing_extensions ==4.3.0
umap-learn ==0.5.3
urllib3 ==1.26.11
virtualenv ==20.16.7
wcwidth ==0.2.5
wrapt ==1.15.0
xgboost ==1.7.2
xxhash ==3.2.0
yarl ==1.9.4
zict ==2.2.0
zipp ==3.8.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science