msb1015_assignment2

Repository to keep track of the progress of Msc Systems Biology - MSB1015 2019 Assignment 2

https://github.com/setenhage/msb1015_assignment2

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Repository to keep track of the progress of Msc Systems Biology - MSB1015 2019 Assignment 2

Basic Info
  • Host: GitHub
  • Owner: setenhage
  • License: mit
  • Language: HTML
  • Default Branch: master
  • Homepage:
  • Size: 435 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 6 years ago · Last pushed over 6 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

MSB1015 Assignment 2

Welcome to the repository of MSB1015 Assignment 2! Here I keep track of my progress of MSB1015 2019 Assignment 2 at Maastricht University. The result of this assignment can be seen here

Project Description

Chemical properties, such as the boiling point, can be derived from the structure of a chemical compound. In 1947, Harry Wiener already made a correlation model to link structural features to boiling points (ref1). The idea to use mathematical models to predict chemical properties from compound structures has been expaned since then.

In this project I use a SPARQL query to obtain the smiles and boiling points of simple alkanes from WikiData (ref2). I use the smiles to get descriptors from the chemical development kit (CDK) database (ref3-6). These descriptors contain information on the structural properties of the alkanes (see section 2 for more details). Finally, I train a Partial Least Squares (PLS) model to predict these properties from the chemical properties of the compounds and plot the results.

Files

MSB1015Assignment2SuzannetenHage.rmd <- Code that contains models to predict the boiling point from structural properties of alkanes. The code has been developed to explain the creation of the model and the results step-by-step.
index.html <- html notebook that resulted from MSB1015Assignment2SuzannetenHage.rmd, in order to make a website using GitHub pages.

Installation

JAVA
The rJava package requires Java to be installed. The code has been developed using Java version 1.8.0_191. A tutorial on how to install this Java Version on windows can be found here.

Required R packages:
The project requires several packages. The code checks automatically for missing packages and installs them. The required packages are:
* WikidataQuery * rJava * rcdk * stringi * caTools * pls * Metrics

Authors

Suzanne ten Hage

Additional Information

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

References

  1. Wiener H. Structural Determination of Paraffin Boiling Points. Journal of the American Chemical Society. 1947 Jan;69(1):17–20.
  2. https://www.wikidata.org/wiki/Wikidata:Main_Page (12-10-2019)
  3. Willighagen et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017; 9(3), doi:10.1186/s13321-017-0220-4
  4. May and Steinbeck. Efficient ring perception for the Chemistry Development Kit. J. Cheminform. 2014, doi:10.1186/1758-2946-6-3
  5. Steinbeck et al. Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics. Curr. Pharm. Des. 2006; 12(17):2111-2120, doi:10.2174/138161206777585274
  6. Steinbeck et al. The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 2003 Mar-Apr; 43(2):493-500, doi:10.1021/ci025584y

Owner

  • Login: setenhage
  • Kind: user
  • Location: Utrecht

Citation (CITATION.cff)

cff-version: 1
message: If you use this software, please cite it as below.
authors:
  - family-names: ten Hage
    given-names: Suzanne Eva
title: MSB1015_Assignment_2
version: 1
date-released: 2019-09-25

GitHub Events

Total
Last Year