msb1015_assignment3

Repository to keep track of the progress of Msc Systems Biology - MSB1015 2019 Assignment 3

https://github.com/setenhage/msb1015_assignment3

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Repository to keep track of the progress of Msc Systems Biology - MSB1015 2019 Assignment 3

Basic Info
  • Host: GitHub
  • Owner: setenhage
  • License: mit
  • Language: HTML
  • Default Branch: master
  • Homepage:
  • Size: 4.72 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 6 years ago · Last pushed over 6 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

MSB1015_Assignment3

Welcome to the repository of MSB1015 Assignment 3! Here I keep track of my progress of MSB1015 2019 Assignment 3 at Maastricht University.

Project description

The aim of this project is to show that parallel computing improves the calculation time of logP values compared to sequential computing. Here, we use nextflow to control the number of central processing units (CPUs) used during the calculation. A more elaborate description of this project and the results can be found here.

Pseudocode

  1. Use getSMILES file provided by Egon Willighagen to obtain all compounds and their corresponding canonical and isomeric SMILES.
  2. Create nextflow code that uses the query results and calculates logP values for each compound.
  3. Adapt the nextflow code, such that it can be run using different number of CPUs and record the calculation times.
  4. Visualize the results using an Rmarkdown file.

Files

CPU_time_logP.nf <- contains the code to calculate the logP values for all compounds. This code also times the duration of this calculation and stores the results in a file.
CPU_duration.tsv <- tab-delimited text file that stores the number of CPUs with the corresponding calculation time.
getSMILES <- contains the query used to obtain the compounds.
all_canonical_isomeric_smiles.tsv <- Results from query (downloaded from query.wikiddata.org, using getSMILES).
short.tsv <- file with only 5 smiles to test code before running it on 158800 smiles.
Example_files <- folder with example files provided by Egon Willighagen.
Index.rmd <- Rmarkdown file that contains the code to create index.html.
Index.html <- github page, containing introduction, methods, results and discussion of the project.

Requirements

Following software was used on Windows 10:
Rstudio
Following software was used with the virual Linux environment Debian on Windows 10:
Nextflow
Groovy
Java

How to do this experiment yourself

  1. Dowload all files and remove the CPU_duration.tsv file. If you don't remove it, your own calculation times will get added to the ones that are already stored. Make sure all other files are saved in the same folder.
  2. Open the CPUtimelogP file in Linux environment and adapt this to match the number of CPUs your computer has (it is indicated in the file where to do this)
  3. Run the CPUtimelogP.nf file in nextflow.
  4. Run the index.rmd file in Rstudio.

Authors

Suzanne ten Hage
Egon Willighagen (getSMILES.rq)

References:

Chemistry Development Kit
Wikidata

Licenses

Software created by others is used in this repository. Please respect the licenses of these other creators. Information on the licenses of these external resources can be found here.

Owner

  • Login: setenhage
  • Kind: user
  • Location: Utrecht

Citation (CITATION.cff)

cff-version: 1
message: If you use this software, please cite it as below.
authors:
  - family-names: ten Hage
    given-names: Suzanne Eva
title: MSB1015_Assignment_3
version: 1
date-released: 2019-10-09

GitHub Events

Total
Last Year