genomics-dsa4262

NUS DSA4262 Genomics Project

https://github.com/sanrajmitra97/genomics-dsa4262

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

NUS DSA4262 Genomics Project

Basic Info

Host: GitHub
Owner: sanrajmitra97
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 2.56 MB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme Citation

DSA4262-proj-genericteam

NUS DSA4262 Genomics Project

Purpose of software

An example of post-transcriptional modification is m6A modification, where a methyl group is added to an adenosine molecule which results in a change of chemical structure. Such modification plays an important rule in cellular fate governance and cellular processes; and it can also be linked to life-threatening diseases like cancer. The nanopore direct RNA sequencing technology captures information such as the direct RNA current for each RNA module, the standard deviation and the dwelling time of the molecule as it passes through a nanopore. This script includes the model training process to develop a supervised machine learning model to predict the m6A modification sites. A sample train and test dataset is provided to run the model and evaluate its performance.

Installation and Execution

Step 1: Launch AWS and navigate to your home directory using the command: $ cd ~
Step 2: Create a virtual envionment by running the following bash commands:
- Install pip $ sudo apt-get install python3-pip
- Intall virtualenv $ sudo pip3 install virtualenv
- Check that you have downloaded virtualenv using the command: $ virtualenv --version
- Replace 'venv name' with a name for your virtual environment.$ virtualenv 'venv name'
- Activate your virtual envionment $ source 'venv name'/bin/activate
Step 3: git clone this repository using the command: $ git clone https://github.com/sanrajmitra97/genomics-DSA4262.git
- If for any reason the above command returns a "fatal error" when cloning, try this command instead: $ git clone --depth 1 https://github.com/sanrajmitra97/genomics-DSA4262.git. Some machines may face the error when cloning, but not all.
Step 4: Locate the requirements.txt file and run the following command: $ pip install -r /path/to/requirements.txt
Step 5: Once done, cd to the src folder: $ cd src
To run the test script, proceed to step 7. If you would like to run the training script, proceed to step 6.
Step 6: Run the following command: $ python train_model.py
- You will be prompted to add a name for the model. Feel free to add any name for the model. For example, "model1".
- The results of the model's training performance will also be printed.
- This model is also saved in a "models" folder in the parent directory of src.
Step 7: Run the following command while still in the src folder: $ python make_pred.py
- If you ran step 6, the model trained in step 6 will be used for making predictions on a small test dataset.
- If you skipped to step 7, the model used to make predictions is already located in the models folder.
Step 8: The predictions of the test dataset will be located in a folder called 'Results' in the parent directory of src.

### Interpretation To view the results, we can use the head command to view the prediction results. Navigate to the results folder and type the following: $ head results_0.csv Output
The first column shows the transcriptid, the second column shows the transcriptposition and the third column shows the predicted score of m6A modification. A higher score denotes a higher probability of the site containing an m6A modification.

### Results To learn about our methods and findings of this project, feel free to view our report under the reports folder.

Owner

Name: Sanraj
Login: sanrajmitra97
Kind: user
Location: Singapore

Repositories: 2
Profile: https://github.com/sanrajmitra97

Data Science and Analytics at the National University of Singapore(NUS)

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Mitra"
  given-names: "Sanraj"
- family-names: "Ho"
  given-names: "Carine"
- family-names: "Cho"
  given-names: "Andrea"
- family-names: "Sriram"
  given-names: "Shreya"
- family-names: "Suhaina"
  given-names: "Noorus"
title: "DSA4262 proj-genericteam"
version: 1.0.0
date-released: 2022-11-06
url: "https://github.com/sanrajmitra97/genomics-DSA4262"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

pandas ==1.4.3
rich ==12.5.1
scikit-learn ==1.0.2
tqdm ==4.64.0
xgboost ==1.6.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

genomics-dsa4262

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

DSA4262-proj-genericteam

Purpose of software

Installation and Execution

Owner

Citation (citation.cff)

GitHub Events

Total

Last Year

Dependencies