scaffoldgenerator
A CDK-based library for generating Scaffold Trees and Scaffold Networks
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 55 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
A CDK-based library for generating Scaffold Trees and Scaffold Networks
Basic Info
Statistics
- Stars: 11
- Watchers: 3
- Forks: 6
- Open Issues: 0
- Releases: 4
Metadata Files
README.md

Scaffold Generator
A CDK-based library for generating Scaffold Trees and Scaffold Networks
:warning:DISCLAIMER:warning:: This repository contains legacy code! The project has been moved and is now available and maintained as the cdk-scaffold module.
The GraphStream-based visualisation functionalities are available in a separate repository: https://github.com/JonasSchaub/scaffold-graph-vis
Some of the cdk-scaffold functionalities are also implemented in the MORTAR (MOlecule fRagmenTAtion fRamework) rich client Graphical User Interface (GUI) application (GitHub repository | article)
Description
The Scaffold Generator library is designed to make molecular scaffold-related functionalities available in applications
and workflows based on the Chemistry Development Kit (CDK). Building mainly upon the works by
Bemis and Murcko, Schuffenhauer et al.,
and Varin et al., it offers scaffold perception and dissection based on single
molecules and molecule collections. From the latter, Scaffold Trees and Scaffold Networks can be constructed,
represented in data structures, and visualised using the GraphStream library.
Multiple options to fine-tune and adapt the routines are available.
A scientific article describing the library has been published and is available here:
https://doi.org/10.1186/s13321-022-00656-x
Scaffold Generator is also available in the open Java rich client application MORTAR ('MOlecule fRagmenTation fRamework')
where in silico molecule fragmentation can be easily conducted on a given data set and the results visualised
(MORTAR GitHub repository, MORTAR article preprint).
Contents of this repository
Sources
The ScaffoldGenerator\src\main\java\ folder contains the Java source classes of Scaffold Generator. The class ScaffoldGenerator is the core class of the library making its main functionalities available through convenient, high-level methods. Other classes are used e.g. to represent data structures like Scaffold Trees and Scaffold Networks.
Tests
The test class ScaffoldGeneratorTest illustrates and tests the functionalities of Scaffold Generator; the correct
output of its basic methods like scaffold generation, the more advanced functions to build Scaffold Trees and Scaffold Networks,
the correct application of Schuffenhauer et al.'s prioritization rules (based on the
schemata given in their publication), and the correct workings of the available settings and options. Some examples of
Scaffold Trees and Scaffold Networks are displayed for visual inspection using the GraphStream library
and examples for the basic functionalities are visualised using example molecules imported from the resource folder
(see below) and saved as image files in an output folder. Two examples for the GraphStream visualisation of Scaffold Trees
and Networks can be found in the GraphStreamFigures folder.
Additionally, performance tests are included that apply
specific routines of Scaffold Generator to the whole COCONUT database.
Test resources
The test resources folder at path src\test\resources\ contains MDL MOL files of 23 test molecules used to
illustrate the basic functionalities of Scaffold Generator. They are imported in multiple test methods and the results
saved as image files in respective molecule-specific output folders.
An SD file of the COCONUT database to run the performance tests, is
not included in the repository (see below).
All molecules used in the test methods imported from SMILES codes are also compiled in a separate file named SGTest_SMILES.txt
in the ScaffoldGenerator folder.
Performance Test CMD Application
The folder ScaffoldGenerator\PerformanceTestCMDApp contains the executable JAVA archive ScaffoldGenerator-jar-with-dependencies.jar. It can be executed from the command-line (command: java -jar) to do a performance snapshot of Scaffold Generator's scaling behaviour for a growing number of input molecules. It requires two command-line arguments:
- file name of an SDF located in the same directory as the JAR (not given)
- integer number specifying into how many equally-sized bins the data set should be split in the analysis.
Example usage: java -jar ScaffoldGenerator-jar-with-dependencies.jar input-file-in-same-dir-name.sdf 10
The CMD application will then import the data set, split it into the given number of equally sized bins, create Scaffold Trees and
Scaffold Networks for an increasing combination of those structure bins, and create detailed output files of the measured
runtimes.
The source code of the CMD application can be found in the src folder with the other sources.
Installation
This is a Maven project. In order to use the source code for your own software, download or clone the repository and
open it in a Maven-supporting IDE (e.g. IntelliJ) as a Maven project and execute the pom.xml file. Maven will then take
care of installing all dependencies. A Java Development Kit (JDK) of version 17 or higher must also be pre-installed.
To run the COCONUT-analysing tests, an SD file of the database needs to be placed in the test "resources" folder
at path src\test\resources\COCONUT_DB.sdf.
The respective file can be downloaded at https://coconut.naturalproducts.net/download.
Dependencies
Needs to be pre-installed: * Java Development Kit (JDK) version 17 * Adoptium OpenJDK (as one possible source of the JDK) * Apache Maven version 4 * Apache Maven
Managed by Maven: * Chemistry Development Kit (CDK) version 2.8 * Chemistry Development Kit on GitHub * License: GNU Lesser General Public License 2.1 * GraphStream version 2.0 * GraphStream project * License: CeCILL-C and GNU Lesser General Public License 3 * JUnit version 4.13.2 * JUnit 4 * License: Eclipse Public License 1.0
References and useful links
Conceptual Scaffold, Scaffold Tree, and Scaffold Network articles * G. W. Bemis and M. A. Murcko, “The Properties of Known Drugs. 1. Molecular Frameworks,” J. Med. Chem., vol. 39, no. 15, pp. 2887–2893, Jan. 1996, doi: 10.1021/jm9602928. * S. J. Wilkens, J. Janes, and A. I. Su, “HierS: Hierarchical Scaffold Clustering Using Topological Chemical Graphs,” J. Med. Chem., vol. 48, no. 9, pp. 3182–3193, May 2005, doi: 10.1021/jm049032d. * M. A. Koch et al., “Charting biologically relevant chemical space: A structural classification of natural products (SCONP),” Proceedings of the National Academy of Sciences, vol. 102, no. 48, pp. 17272–17277, Nov. 2005, doi: 10.1073/pnas.0503647102. * A. Schuffenhauer, P. Ertl, S. Roggo, S. Wetzel, M. A. Koch, and H. Waldmann, “The Scaffold Tree − Visualization of the Scaffold Universe by Hierarchical Scaffold Classification,” J. Chem. Inf. Model., vol. 47, no. 1, pp. 47–58, Jan. 2007, doi: 10.1021/ci600338x. * T. Varin et al., “Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data,” J. Chem. Inf. Model., vol. 50, no. 12, pp. 2067–2078, Dec. 2010, doi: 10.1021/ci100203e. * T. Varin, A. Schuffenhauer, P. Ertl, and S. Renner, “Mining for Bioactive Scaffolds with Scaffold Networks: Improved Compound Set Enrichment from Primary Screening Data,” J. Chem. Inf. Model., vol. 51, no. 7, pp. 1528–1538, Jul. 2011, doi: 10.1021/ci2000924. * C. Manelfi et al., “‘Molecular Anatomy’: a new multi-dimensional hierarchical scaffold analysis tool,” J Cheminform, vol. 13, no. 1, p. 54, Dec. 2021, doi: 10.1186/s13321-021-00526-y.
Chemistry Development Kit (CDK) * Chemistry Development Kit on GitHub * Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J Chem Inform Comput Sci. 2003;43(2):493-500. * Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr Pharm Des. 2006; 12(17):2111-2120. * May JW and Steinbeck C. Efficient ring perception for the Chemistry Development Kit. J. Cheminform. 2014; 6:3. * Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluska T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C, The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. 2017; 9:33. * Groovy Cheminformatics with the Chemistry Development Kit
GraphStream * GraphStream project * Antoine Dutot, Frédéric Guinand, Damien Olivier, Yoann Pigné. GraphStream: A Tool for bridging the gap between Complex Systems and Dynamic Graphs. Emergent Properties in Natural and Artificial Complex Systems. Satellite Conference within the 4th European Conference on Complex Systems (ECCS’2007), Oct 2007, Dresden, Germany. ffhal-00264043
COlleCtion of Open NatUral producTs (COCONUT) * COCONUT Online home page * Sorokina, M., Merseburger, P., Rajan, K. et al. COCONUT online: Collection of Open Natural Products database. J Cheminform 13, 2 (2021). https://doi.org/10.1186/s13321-020-00478-9 * Sorokina, M., Steinbeck, C. Review on natural products databases: where to find data in 2020. J Cheminform 12, 20 (2020).
Owner
- Login: Julian-W98
- Kind: user
- Repositories: 1
- Profile: https://github.com/Julian-W98
Citation (CITATION.cff)
cff-version: 1.2.0
title: Scaffold Generator
version: 1.0.3
message: "If you use this software, please cite it as below and also cite the accompanying scientific publication referenced below."
type: software
authors:
- family-names: "Zander"
given-names: "Julian"
orcid: "https://orcid.org/0000-0001-8197-076X"
- family-names: "Schaub"
given-names: "Jonas"
orcid: "https://orcid.org/0000-0003-1554-6666"
- family-names: "Zielesny"
given-names: "Achim"
orcid: "https://orcid.org/0000-0003-0722-4229"
- family-names: "Steinbeck"
given-names: "Christoph"
orcid: "https://orcid.org/0000-0001-6966-0814"
doi: "10.5281/zenodo.7088601"
date-released: 2022-10-24
url: "https://github.com/Julian-Z98/ScaffoldGenerator"
license: LGPL-2.1
references:
- authors:
- family-names: "Schaub"
given-names: "Jonas"
orcid: "https://orcid.org/0000-0003-1554-6666"
- family-names: "Zander"
given-names: "Julian"
orcid: "https://orcid.org/0000-0001-8197-076X"
- family-names: "Zielesny"
given-names: "Achim"
orcid: "https://orcid.org/0000-0003-0722-4229"
- family-names: "Steinbeck"
given-names: "Christoph"
orcid: "https://orcid.org/0000-0001-6966-0814"
doi: "10.1186/s13321-022-00656-x"
journal: "J Cheminform"
scope: "Cite this article if you want to reference the general concepts of the software."
title: "Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK)"
type: article
year: 2022
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Dependencies
- junit:junit 4.13.2
- org.graphstream:gs-algo 2.0
- org.graphstream:gs-core 2.0
- org.graphstream:gs-ui-swing 2.0
- org.openscience.cdk:cdk-bundle 2.7.1