https://github.com/biocomputingup/caid
Critical Assessment of Intrinsic Disorder
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: wiley.com, nature.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Critical Assessment of Intrinsic Disorder
Basic Info
- Host: GitHub
- Owner: BioComputingUP
- Language: Python
- Default Branch: main
- Size: 127 MB
Statistics
- Stars: 14
- Watchers: 6
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CAID Assessment
This repository contains the code for CAID challenge assessment. Upon having predictions and reference sets, you can use this repository to generate the evalutations and metrics. CAID software packages wraps the vectorizedclsmetrics repository (with small modifications), which performs the calculations of the classification metrics used throughout CAID. For the details of evaluations, please see the Papers section.
If you use this code in your research, please cite the following papers:
CAID2 - Conte AD, Mehdiabadi M, Bouhraoua A, Miguel Monzon A, Tosatto SCE, Piovesan D. Critical assessment of protein intrinsic disorder prediction (CAID) - Results of round 2. Proteins. 2023; 91(12): 1925-1934 (2023)
CAID1 - Necci, M., Piovesan, D., CAID Predictors. et al. Critical assessment of protein intrinsic disorder prediction. Nat Methods 18, 472–481 (2021)
Installation
To run this package, you need to have Python 3.8+ installed.
``` git clone https://github.com/BioComputingUP/CAID.git # clone the repository
pip install -r requirements.txt # install the requirements ```
The repository is structures as below (the demo-data just contains sample data from CAID3 and the results you get from the assessment).
CAID --> (CAID repository)
├── caid.py --> the script to run the evaluaions
├── vectorized_metircs/ --> the assessment library
└── demo-data/ --> demo data directory, with sample data from CAID3 challenge
├── predictions/ --> directory containing prediction of each method
├── references/ --> directory containing reference fasta file
└── results/ --> directory for saving results
Input
Predictions
In order to run the assessment, you have to have your predictions in CAID ouptut format (see https://caid.idpcentral.org/challenge), where columns correspond to position, residue type, disorder/binding score, and a binary state. If the state is not provided, it will be automatically calculated using a threshold by maximizing f1-score.
```
DP01234 1 M 0.892 1 2 E 0.813 1 ... ``` Each file must be stored with .caid suffix. You can access and download all CAID challenge results from https://caid.idpcentral.org/challenge/results.
References
References must be provided as a single fasta file, includeing the sequence and the labels corresponding to each residue. In the labels, 0 indicates order, 1 indicates disorder/binding/linker, and - denotes that this residue is not included in the assessment. All the CAID challenge references can be downloaded from https://caid.idpcentral.org/challenge/results.
```
DP01234 MNASDFRRRGKEMVDYMADYLE 000011111000---------- ```
Output
After running the assessment (see usage), the following files are generated.
```bash
Score distribution for a given method. rawscore are all scores, thresholds is the unique list of thresholds
dataset the metrics for every considered threshold for a given reference and method
bootstrap same as dataset but for every boostrap sample
target same as dataset but for every predicted target
Optimal thresholds for every calculated metric for a given reference and method
ci confidence intervals for all methods for a given reference and optimization
bootstrap metrics for each method and each boostrap sample for every method for a given reference and optimization
cmat confusion matrix for every method for a given reference and optimization
metrics metrics for each method
cmat confusion matrices for all methods and all thresholds for a given reference
pr precision-recall data for all methods
roc ROC data for all methods
predictions scores and binary predictions for all methods at the residue level
metrics for all methods a the target level for a given reference and optimization
```
Usage
To run the assessment, you can run the caid.py script with arguments explained as below:
python3 caid.py <path-to-reference-fasta> <directory-containing-predictions> -o <output-directory>
For example, the demo-data/predictions folder contains the predictions of 3 predictors from CAID3, and demo-data/references/disorder_pdb.fasta is the Disorder-PDB from CAID3. The script could be run by:
python3 caid.py demo-data/references/disorder_pdb.fasta demo-data/predictions -o demo-data/results
License
Owner
- Name: BioComputing Group, University of Padova
- Login: BioComputingUP
- Kind: organization
- Email: biocomp@bio.unipd.it
- Location: Italy
- Website: https://biocomputingup.it/
- Repositories: 31
- Profile: https://github.com/BioComputingUP
GitHub Events
Total
- Watch event: 2
- Push event: 8
Last Year
- Watch event: 2
- Push event: 8
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| marnec | r****o@g****m | 38 |
| damiano | d****n@u****t | 3 |
| Damiano Piovesan | d****n@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- numpy ==2.3.2
- pandas ==2.3.1
- scipy ==1.16.1
- tqdm ==4.66.2