https://github.com/biocomputingup/caid

Critical Assessment of Intrinsic Disorder

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: wiley.com, nature.com
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 7 months ago · JSON representation

Repository

Critical Assessment of Intrinsic Disorder

Basic Info

Host: GitHub
Owner: BioComputingUP
Language: Python
Default Branch: main
Size: 127 MB

Statistics

Stars: 14
Watchers: 6
Forks: 1
Open Issues: 0
Releases: 0

Created over 5 years ago · Last pushed 8 months ago

Metadata Files

Readme

CAID Assessment

This repository contains the code for CAID challenge assessment. Upon having predictions and reference sets, you can use this repository to generate the evalutations and metrics. CAID software packages wraps the vectorizedclsmetrics repository (with small modifications), which performs the calculations of the classification metrics used throughout CAID. For the details of evaluations, please see the Papers section.

If you use this code in your research, please cite the following papers:

CAID2 - Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins. 2023; 91(12): 1925-1934 (2023)
CAID1 - Necci, M., Piovesan, D., CAID Predictors. et al. Critical assessment of protein intrinsic disorder prediction. Nat Methods 18, 472–481 (2021)

Installation

To run this package, you need to have Python 3.8+ installed.

``` git clone https://github.com/BioComputingUP/CAID.git # clone the repository

pip install -r requirements.txt # install the requirements ```

The repository is structures as below (the demo-data just contains sample data from CAID3 and the results you get from the assessment). CAID --> (CAID repository) ├── caid.py --> the script to run the evaluaions ├── vectorized_metircs/ --> the assessment library └── demo-data/ --> demo data directory, with sample data from CAID3 challenge ├── predictions/ --> directory containing prediction of each method ├── references/ --> directory containing reference fasta file └── results/ --> directory for saving results

Input

Predictions

In order to run the assessment, you have to have your predictions in CAID ouptut format (see https://caid.idpcentral.org/challenge), where columns correspond to position, residue type, disorder/binding score, and a binary state. If the state is not provided, it will be automatically calculated using a threshold by maximizing f1-score.

```

DP01234 1 M 0.892 1 2 E 0.813 1 ... ``` Each file must be stored with .caid suffix. You can access and download all CAID challenge results from https://caid.idpcentral.org/challenge/results.

References

References must be provided as a single fasta file, includeing the sequence and the labels corresponding to each residue. In the labels, 0 indicates order, 1 indicates disorder/binding/linker, and - denotes that this residue is not included in the assessment. All the CAID challenge references can be downloaded from https://caid.idpcentral.org/challenge/results.

```

DP01234 MNASDFRRRGKEMVDYMADYLE 000011111000---------- ```

Output

After running the assessment (see usage), the following files are generated.

```bash

Score distribution for a given method. `rawscore` are all scores, `thresholds` is the unique list of thresholds

.{rawscore,thresholds}.distribution.txt

`dataset` the metrics for every considered threshold for a given `reference` and `method`

`bootstrap` same as `dataset` but for every boostrap sample

`target` same as `dataset` but for every predicted target

.analysis..{bootstrap,dataset,target}.metrics.csv

Optimal thresholds for every calculated metric for a given `reference` and `method`

.analysis..thr.csv

`ci` confidence intervals for all methods for a given `reference` and `optimization`

`bootstrap` metrics for each method and each boostrap sample for every method for a given `reference` and `optimization`

.all.{ci,bootstrap}..metrics.csv

`cmat` confusion matrix for every method for a given `reference` and `optimization`

`metrics` metrics for each method

.all.dataset..{cmat,metrics}.csv

`cmat` confusion matrices for all methods and all thresholds for a given `reference`

`pr` precision-recall data for all methods

`roc` ROC data for all methods

`predictions` scores and binary predictions for all methods at the residue level

.all.dataset._.{cmat,pr,predictions,roc}.csv

metrics for all methods a the target level for a given `reference` and `optimization`

.all.target..metrics.csv

```

Usage

To run the assessment, you can run the caid.py script with arguments explained as below: python3 caid.py <path-to-reference-fasta> <directory-containing-predictions> -o <output-directory> For example, the demo-data/predictions folder contains the predictions of 3 predictors from CAID3, and demo-data/references/disorder_pdb.fasta is the Disorder-PDB from CAID3. The script could be run by:

python3 caid.py demo-data/references/disorder_pdb.fasta demo-data/predictions -o demo-data/results

License

CC BY 3.0

Owner

Name: BioComputing Group, University of Padova
Login: BioComputingUP
Kind: organization
Email: biocomp@bio.unipd.it
Location: Italy

Website: https://biocomputingup.it/
Repositories: 31
Profile: https://github.com/BioComputingUP

GitHub Events

Total

Watch event: 2
Push event: 8

Last Year

Watch event: 2
Push event: 8

Committers

Last synced: over 2 years ago

All Time

Total Commits: 43
Total Committers: 3
Avg Commits per committer: 14.333
Development Distribution Score (DDS): 0.116

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
marnec	r**o@g**m	38
damiano	d**n@u**t	3
Damiano Piovesan	d**n@g**m	2

Committer Domains (Top 20 + Academic)

unipd.it: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/biocomputingup/caid

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

CAID Assessment

Installation

Input

Predictions

References

Output

Score distribution for a given method. rawscore are all scores, thresholds is the unique list of thresholds

dataset the metrics for every considered threshold for a given reference and method

bootstrap same as dataset but for every boostrap sample

target same as dataset but for every predicted target

Optimal thresholds for every calculated metric for a given reference and method

ci confidence intervals for all methods for a given reference and optimization

bootstrap metrics for each method and each boostrap sample for every method for a given reference and optimization

cmat confusion matrix for every method for a given reference and optimization

metrics metrics for each method

cmat confusion matrices for all methods and all thresholds for a given reference

pr precision-recall data for all methods

roc ROC data for all methods

predictions scores and binary predictions for all methods at the residue level

metrics for all methods a the target level for a given reference and optimization

Usage

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Score distribution for a given method. `rawscore` are all scores, `thresholds` is the unique list of thresholds

`dataset` the metrics for every considered threshold for a given `reference` and `method`

`bootstrap` same as `dataset` but for every boostrap sample

`target` same as `dataset` but for every predicted target

Optimal thresholds for every calculated metric for a given `reference` and `method`

`ci` confidence intervals for all methods for a given `reference` and `optimization`

`bootstrap` metrics for each method and each boostrap sample for every method for a given `reference` and `optimization`

`cmat` confusion matrix for every method for a given `reference` and `optimization`

`metrics` metrics for each method

`cmat` confusion matrices for all methods and all thresholds for a given `reference`

`pr` precision-recall data for all methods

`roc` ROC data for all methods

`predictions` scores and binary predictions for all methods at the residue level

metrics for all methods a the target level for a given `reference` and `optimization`