qc-b3db
A large benchmark dataset, Blood-Brain Barrier Database (B3DB), complied from 50 published resources.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
A large benchmark dataset, Blood-Brain Barrier Database (B3DB), complied from 50 published resources.
Basic Info
Statistics
- Stars: 64
- Watchers: 7
- Forks: 33
- Open Issues: 4
- Releases: 4
Metadata Files
README.md
About B3DB
In this repo, we present a large benchmark dataset, Blood-Brain Barrier Database (B3DB), compiled from 50 published resources (as summarized at rawdata/rawdata_summary.tsv) and categorized based on the consistency between different experimental references/measurements. This dataset was published in Scientific Data and this repository is occasionally uploaded with new experimental data. Scientists who would like to contribute data should contact the database's maintainers (e.g., by creating a new Issue in this database).
A subset of the
molecules in B3DB has numerical logBB values (1058 compounds), while the whole dataset
has categorical (BBB+ or BBB-) BBB permeability labels (7807 compounds prior to v1.0.0 and 7982 compounds after). Some physicochemical properties
of the molecules are also provided.
Citation
Please use the following citations in any publication using our B3DB dataset:
```md @article{MengAcurateddiverse2021, author = {Meng, Fanwang and Xi, Yang and Huang, Jinfeng and Ayers, Paul W.}, doi = {10.1038/s41597-021-01069-5}, journal = {Scientific Data}, number = {289}, title = {A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors}, volume = {8}, year = {2021}, url = {https://www.nature.com/articles/s41597-021-01069-5}, publisher = {Springer Nature} }
@article{MengB3clf2025, author = {Meng, Fanwang and Chen, Jitian and Collins-Ramirez, Juan Samuel and Ayers, Paul W.}, doi = {xxx}, journal = {xxx}, number = {xxx}, title = {B3clf: A Resampling-Integrated Machine Learning Framework to Predict Blood-Brain Barrier Permeability}, volume = {x}, year = {xxx}, url = {xxx}, publisher = {xxx} } ```
Features of B3DB
The largest dataset with numerical and categorical values for Blood-Brain Barrier small molecules (to the best of our knowledge, as of February 25, 2021).
Inclusion of stereochemistry information with isomeric SMILES with chiral specifications if available. Otherwise, canonical SMILES are used.
Characterization of uncertainty of experimental measurements by grouping the collected molecular data records.
Extended datasets for numerical and categorical data with precomputed physicochemical properties using mordred.
Usage
Via PyPI
The B3DB dataset is avaliable at PyPI. One can install it using pip:
bash
pip install qc-B3DB
Then load the data (dictionary of pandas dataframe) with the following code snippet:
```python
from B3DB import B3DBDATADICT
access the data via dictionary keys
'B3DB_regression'
'B3DBregressionextended'
'B3DB_classification'
'B3DBclassificationextended'
"B3DBclassificationexternal"
dfb3dbreg = B3DBDATADICT["B3DBregression"] dfb3db_reg.head()
NO. compound_name ... group comments
0 1 moxalactam ... A NaN
1 2 schembl614298 ... A NaN
2 3 morphine-6-glucuronide ... A NaN
3 4 2-[4-(5-bromo-3-methylpyridin-2-yl)butylamino]... ... A NaN
4 5 NaN ... A NaN
[5 rows x 10 columns]
```
Manually Download the Data
There are two types of dataset in B3DB, regression data and classification data and they can be loaded simply using pandas. For example
```python import pandas as pd
load regression dataset
regressiondata = pd.readcsv("B3DB/B3DB_regression.tsv", sep="\t")
load classification dataset
classificationdata = pd.readcsv("B3DB/B3DB_classification.tsv", sep="\t")
load extended regression dataset
regressiondataextended = pd.readcsv("B3DB/B3DBregression_extended.tsv.gz", sep="\t", compression="gzip")
load extended classification dataset
classificationdataextended = pd.readcsv("B3DB/B3DBclassification_extended.tsv.gz", sep="\t", compression="gzip")
```
Examples in Jupyter Notebooks
We also have three examples to show how to use our dataset,
numericaldataanalysis.ipynb,
PCAprojectionfingerprint.ipynb and
PCAprojectiondescriptors.ipynb.
PCAprojectiondescriptors.ipynb uses precomputed
chemical descriptors for visualization of chemical space of B3DB, and can be used directly
using MyBinder,
.
Due to the difficulty of installing
RDKit in MyBinder, only PCA_projection_descriptors.
ipynb is set up in MyBinder.
Data Curation
Detailed procedures for data curation can be found in data curation section in this repository.
The materials and data under this repo are distributed under the CC0 Licence.
ChangeLog
- 2025Aug16, the B3DB dataset is avaliable via PyPI.
- 2025Aug16, we have added a new set of 171 BBB+ and 4 BBB- compounds to the dataset since version 1.1.0.
Owner
- Name: Theochem
- Login: theochem
- Kind: organization
- Website: https://qcdevs.org/
- Repositories: 35
- Profile: https://github.com/theochem
QC-Devs: A community devoted to developing sustainable software for quantum chemistry, physics, and the computational sciences.
Citation (CITATION.cff)
cff-version: "1.2.0"
message: "If you use this software, please cite it using this metadata. Thank you."
preferred-citation:
authors:
- family-names: Meng
given-names: Fanwang
orcid: "https://orcid.org/0000-0003-2886-7012"
- family-names: Xi
given-names: Yang
- family-names: Huang
given-names: Jinfeng
orcid: "https://orcid.org/0000-0002-6342-8536"
- family-names: Ayers
given-names: "Paul W."
journal: "Scientific Data"
title: A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors
type: article
doi: "10.1038/s41597-021-01069-5"
isbn: 2052-4463
year: 2021
volume: 8
issue: 289
date-released: "2021-10-29"
GitHub Events
Total
- Create event: 13
- Release event: 5
- Issues event: 1
- Watch event: 19
- Delete event: 11
- Push event: 15
- Pull request review event: 4
- Pull request event: 15
- Fork event: 6
Last Year
- Create event: 13
- Release event: 5
- Issues event: 1
- Watch event: 19
- Delete event: 11
- Push event: 15
- Pull request review event: 4
- Pull request event: 15
- Fork event: 6
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 16
- Total pull requests: 34
- Average time to close issues: 3 months
- Average time to close pull requests: less than a minute
- Total issue authors: 5
- Total pull request authors: 3
- Average comments per issue: 1.5
- Average comments per pull request: 0.12
- Merged pull requests: 30
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- FanwangM (5)
- PaulWAyers (2)
- ssiddhantsharma (1)
- HenryJia (1)
- sumone-compbio (1)
Pull Request Authors
- FanwangM (26)
- Shania99 (2)
- dependabot[bot] (1)
- JitianChen (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: qc-b3db
Subset selection with maximum diversity.
- Documentation: https://qc-b3db.readthedocs.io/
- License: Creative Commons Legal Code CC0 1.0 Universal CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER. Statement of Purpose The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work"). Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works ("Commons") that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others. For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the "Affirmer"), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights. 1. Copyright and Related Rights. A Work made available under CC0 may be protected by copyright and related or neighboring rights ("Copyright and Related Rights"). Copyright and Related Rights include, but are not limited to, the following: i. the right to reproduce, adapt, distribute, perform, display, communicate, and translate a Work; ii. moral rights retained by the original author(s) and/or performer(s); iii. publicity and privacy rights pertaining to a person's image or likeness depicted in a Work; iv. rights protecting against unfair competition in regards to a Work, subject to the limitations in paragraph 4(a), below; v. rights protecting the extraction, dissemination, use and reuse of data in a Work; vi. database rights (such as those arising under Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, and under any national implementation thereof, including any amended or successor version of such directive); and vii. other similar, equivalent or corresponding rights throughout the world based on applicable law or treaty, and any national implementations thereof. 2. Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer's heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer's express Statement of Purpose. 3. Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer's express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer's Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the "License"). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer's express Statement of Purpose. 4. Limitations and Disclaimers. a. No trademark or patent rights held by Affirmer are waived, abandoned, surrendered, licensed or otherwise affected by this document. b. Affirmer offers the Work as-is and makes no representations or warranties of any kind concerning the Work, express, implied, statutory or otherwise, including without limitation warranties of title, merchantability, fitness for a particular purpose, non infringement, or the absence of latent or other defects, accuracy, or the present or absence of errors, whether or not discoverable, all to the greatest extent permissible under applicable law. c. Affirmer disclaims responsibility for clearing rights of other persons that may apply to the Work or any use thereof, including without limitation any person's Copyright and Related Rights in the Work. Further, Affirmer disclaims responsibility for obtaining any necessary consents, permissions or other rights required for any use of the Work. d. Affirmer understands and acknowledges that Creative Commons is not a party to this document and has no duty or obligation with respect to this CC0 or use of the Work. For more information, please see <http://creativecommons.org/publicdomain/zero/1.0/>
-
Latest release: 0.1.0a1
published 7 months ago