scaffoldgenerator

A CDK-based library for generating Scaffold Trees and Scaffold Networks

https://github.com/julian-w98/scaffoldgenerator

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 55 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A CDK-based library for generating Scaffold Trees and Scaffold Networks

Basic Info
  • Host: GitHub
  • Owner: Julian-W98
  • License: lgpl-2.1
  • Language: Java
  • Default Branch: main
  • Homepage:
  • Size: 12.5 MB
Statistics
  • Stars: 11
  • Watchers: 3
  • Forks: 6
  • Open Issues: 0
  • Releases: 4
Created almost 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

DOI

Scaffold_Generator_logo

Scaffold Generator

A CDK-based library for generating Scaffold Trees and Scaffold Networks
:warning:DISCLAIMER:warning:: This repository contains legacy code! The project has been moved and is now available and maintained as the cdk-scaffold module.
The GraphStream-based visualisation functionalities are available in a separate repository: https://github.com/JonasSchaub/scaffold-graph-vis
Some of the cdk-scaffold functionalities are also implemented in the MORTAR (MOlecule fRagmenTAtion fRamework) rich client Graphical User Interface (GUI) application (GitHub repository | article)

Description

The Scaffold Generator library is designed to make molecular scaffold-related functionalities available in applications and workflows based on the Chemistry Development Kit (CDK). Building mainly upon the works by Bemis and Murcko, Schuffenhauer et al., and Varin et al., it offers scaffold perception and dissection based on single molecules and molecule collections. From the latter, Scaffold Trees and Scaffold Networks can be constructed, represented in data structures, and visualised using the GraphStream library. Multiple options to fine-tune and adapt the routines are available.
A scientific article describing the library has been published and is available here: https://doi.org/10.1186/s13321-022-00656-x
Scaffold Generator is also available in the open Java rich client application MORTAR ('MOlecule fRagmenTation fRamework') where in silico molecule fragmentation can be easily conducted on a given data set and the results visualised (MORTAR GitHub repository, MORTAR article preprint).

Contents of this repository

Sources

The ScaffoldGenerator\src\main\java\ folder contains the Java source classes of Scaffold Generator. The class ScaffoldGenerator is the core class of the library making its main functionalities available through convenient, high-level methods. Other classes are used e.g. to represent data structures like Scaffold Trees and Scaffold Networks.

Tests

The test class ScaffoldGeneratorTest illustrates and tests the functionalities of Scaffold Generator; the correct output of its basic methods like scaffold generation, the more advanced functions to build Scaffold Trees and Scaffold Networks, the correct application of Schuffenhauer et al.'s prioritization rules (based on the schemata given in their publication), and the correct workings of the available settings and options. Some examples of Scaffold Trees and Scaffold Networks are displayed for visual inspection using the GraphStream library and examples for the basic functionalities are visualised using example molecules imported from the resource folder (see below) and saved as image files in an output folder. Two examples for the GraphStream visualisation of Scaffold Trees and Networks can be found in the GraphStreamFigures folder.
Additionally, performance tests are included that apply specific routines of Scaffold Generator to the whole COCONUT database.

Test resources

The test resources folder at path src\test\resources\ contains MDL MOL files of 23 test molecules used to illustrate the basic functionalities of Scaffold Generator. They are imported in multiple test methods and the results saved as image files in respective molecule-specific output folders.
An SD file of the COCONUT database to run the performance tests, is not included in the repository (see below).
All molecules used in the test methods imported from SMILES codes are also compiled in a separate file named SGTest_SMILES.txt in the ScaffoldGenerator folder.

Performance Test CMD Application

The folder ScaffoldGenerator\PerformanceTestCMDApp contains the executable JAVA archive ScaffoldGenerator-jar-with-dependencies.jar. It can be executed from the command-line (command: java -jar) to do a performance snapshot of Scaffold Generator's scaling behaviour for a growing number of input molecules. It requires two command-line arguments:

  • file name of an SDF located in the same directory as the JAR (not given)
  • integer number specifying into how many equally-sized bins the data set should be split in the analysis.

Example usage: java -jar ScaffoldGenerator-jar-with-dependencies.jar input-file-in-same-dir-name.sdf 10
The CMD application will then import the data set, split it into the given number of equally sized bins, create Scaffold Trees and Scaffold Networks for an increasing combination of those structure bins, and create detailed output files of the measured runtimes.
The source code of the CMD application can be found in the src folder with the other sources.

Installation

This is a Maven project. In order to use the source code for your own software, download or clone the repository and open it in a Maven-supporting IDE (e.g. IntelliJ) as a Maven project and execute the pom.xml file. Maven will then take care of installing all dependencies. A Java Development Kit (JDK) of version 17 or higher must also be pre-installed.
To run the COCONUT-analysing tests, an SD file of the database needs to be placed in the test "resources" folder at path src\test\resources\COCONUT_DB.sdf. The respective file can be downloaded at https://coconut.naturalproducts.net/download.

Dependencies

Needs to be pre-installed: * Java Development Kit (JDK) version 17 * Adoptium OpenJDK (as one possible source of the JDK) * Apache Maven version 4 * Apache Maven

Managed by Maven: * Chemistry Development Kit (CDK) version 2.8 * Chemistry Development Kit on GitHub * License: GNU Lesser General Public License 2.1 * GraphStream version 2.0 * GraphStream project * License: CeCILL-C and GNU Lesser General Public License 3 * JUnit version 4.13.2 * JUnit 4 * License: Eclipse Public License 1.0

References and useful links

Conceptual Scaffold, Scaffold Tree, and Scaffold Network articles * G. W. Bemis and M. A. Murcko, “The Properties of Known Drugs. 1. Molecular Frameworks,” J. Med. Chem., vol. 39, no. 15, pp. 2887–2893, Jan. 1996, doi: 10.1021/jm9602928. * S. J. Wilkens, J. Janes, and A. I. Su, “HierS: Hierarchical Scaffold Clustering Using Topological Chemical Graphs,” J. Med. Chem., vol. 48, no. 9, pp. 3182–3193, May 2005, doi: 10.1021/jm049032d. * M. A. Koch et al., “Charting biologically relevant chemical space: A structural classification of natural products (SCONP),” Proceedings of the National Academy of Sciences, vol. 102, no. 48, pp. 17272–17277, Nov. 2005, doi: 10.1073/pnas.0503647102. * A. Schuffenhauer, P. Ertl, S. Roggo, S. Wetzel, M. A. Koch, and H. Waldmann, “The Scaffold Tree − Visualization of the Scaffold Universe by Hierarchical Scaffold Classification,” J. Chem. Inf. Model., vol. 47, no. 1, pp. 47–58, Jan. 2007, doi: 10.1021/ci600338x. * T. Varin et al., “Compound Set Enrichment: A Novel Approach to Analysis of Primary HTS Data,” J. Chem. Inf. Model., vol. 50, no. 12, pp. 2067–2078, Dec. 2010, doi: 10.1021/ci100203e. * T. Varin, A. Schuffenhauer, P. Ertl, and S. Renner, “Mining for Bioactive Scaffolds with Scaffold Networks: Improved Compound Set Enrichment from Primary Screening Data,” J. Chem. Inf. Model., vol. 51, no. 7, pp. 1528–1538, Jul. 2011, doi: 10.1021/ci2000924. * C. Manelfi et al., “‘Molecular Anatomy’: a new multi-dimensional hierarchical scaffold analysis tool,” J Cheminform, vol. 13, no. 1, p. 54, Dec. 2021, doi: 10.1186/s13321-021-00526-y.

Chemistry Development Kit (CDK) * Chemistry Development Kit on GitHub * Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen EL. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J Chem Inform Comput Sci. 2003;43(2):493-500. * Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr Pharm Des. 2006; 12(17):2111-2120. * May JW and Steinbeck C. Efficient ring perception for the Chemistry Development Kit. J. Cheminform. 2014; 6:3. * Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluska T, Rojas-Chertó M, Spjuth O, Torrance G, Evelo CT, Guha R, Steinbeck C, The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform. 2017; 9:33. * Groovy Cheminformatics with the Chemistry Development Kit

GraphStream * GraphStream project * Antoine Dutot, Frédéric Guinand, Damien Olivier, Yoann Pigné. GraphStream: A Tool for bridging the gap between Complex Systems and Dynamic Graphs. Emergent Properties in Natural and Artificial Complex Systems. Satellite Conference within the 4th European Conference on Complex Systems (ECCS’2007), Oct 2007, Dresden, Germany. ffhal-00264043

COlleCtion of Open NatUral producTs (COCONUT) * COCONUT Online home page * Sorokina, M., Merseburger, P., Rajan, K. et al. COCONUT online: Collection of Open Natural Products database. J Cheminform 13, 2 (2021). https://doi.org/10.1186/s13321-020-00478-9 * Sorokina, M., Steinbeck, C. Review on natural products databases: where to find data in 2020. J Cheminform 12, 20 (2020).

Owner

  • Login: Julian-W98
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
title: Scaffold Generator
version: 1.0.3
message: "If you use this software, please cite it as below and also cite the accompanying scientific publication referenced below."
type: software
authors:
  - family-names: "Zander"
    given-names: "Julian"
    orcid: "https://orcid.org/0000-0001-8197-076X"
  - family-names: "Schaub"
    given-names: "Jonas"
    orcid: "https://orcid.org/0000-0003-1554-6666"
  - family-names: "Zielesny"
    given-names: "Achim"    
    orcid: "https://orcid.org/0000-0003-0722-4229"
  - family-names: "Steinbeck"
    given-names: "Christoph"
    orcid: "https://orcid.org/0000-0001-6966-0814"
doi: "10.5281/zenodo.7088601"
date-released: 2022-10-24
url: "https://github.com/Julian-Z98/ScaffoldGenerator"
license: LGPL-2.1
references:
  - authors:
      - family-names: "Schaub"
        given-names: "Jonas"
        orcid: "https://orcid.org/0000-0003-1554-6666"
      - family-names: "Zander"
        given-names: "Julian"
        orcid: "https://orcid.org/0000-0001-8197-076X"
      - family-names: "Zielesny"
        given-names: "Achim"
        orcid: "https://orcid.org/0000-0003-0722-4229"
      - family-names: "Steinbeck"
        given-names: "Christoph"
        orcid: "https://orcid.org/0000-0001-6966-0814"
    doi: "10.1186/s13321-022-00656-x"
    journal: "J Cheminform"
    scope: "Cite this article if you want to reference the general concepts of the software."
    title: "Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK)"
    type: article
    year: 2022

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Dependencies

ScaffoldGenerator/pom.xml maven
  • junit:junit 4.13.2
  • org.graphstream:gs-algo 2.0
  • org.graphstream:gs-core 2.0
  • org.graphstream:gs-ui-swing 2.0
  • org.openscience.cdk:cdk-bundle 2.7.1