m6a-detection-project

Detection of m6A from direct RNA-Seq data

https://github.com/louisetxz/m6a-detection-project

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Detection of m6A from direct RNA-Seq data

Basic Info
  • Host: GitHub
  • Owner: louisetxz
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 7.07 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

m6a-detection-project

Table of Contents

Overview

This project aims to address challenges that exist in developing a robust machine-learning classifier by developing a CNN model for identifying m6A modifications in RNA-Seq data, specifically focusing on cell lines found in the SG-NEx Project (2021). By enhancing our ability to detect m6A modifications, we hope to contribute to a deeper understanding of identification of m6A and its potential as a target for therapeutic intervention.

Repository Structure

This section provides an overview of the main folders and files in this repository. bash m6a-detection-project/ ├── data/ # Folder containing RNA-Seq data files │ └── test_data.json # Test data used for running model tests ├── output/ # Folder where model prediction files are saved │ └── model_output_datetime.csv # Example output file with prediction results │ ├── test.py # Script for evaluating the model on test data ├── utils.py # Script containing all relevant functions used in the testing script ├── train.ipynb # Notebook for training and tuning the model ├── final_task2.ipynb # Notebook for testing model to predict m6A sites in all SG-NEx direct RNA-Seq data sets ├── cnn_selected.h5 # Final trained model file │ ├── CITATION.cff # Citation for this repository ├── LICENSE # License for this repository ├── requirements.txt # List of required packages and dependencies ├── README.md # Project overview and setup instructions (this file) └── .gitignore # Specifies files and folders to ignore in version control

Quick Start Guide

Ubuntu setup

  1. Start an Ubuntu instance from ResearchGateway. A recommended and sufficient Ubuntu instance is: t3.medium.
  2. Access your Ubuntu instance.

Cloning the repository

  1. Clone the repository. To clone our repository using HTTPS, run bash git clone https://github.com/louisetxz/m6a-detection-project.git
  2. Change directory into our project. cd m6a-detection-project

Installing dependencies

To update package list, run bash sudo apt-get update To install package manager PIP, run bash sudo apt install python3-pip To install dependencies, run bash pip install -r requirements.txt Note: These dependencies are required only for the testing file, test.py.

Usage

There are two ways you can generate predictions using our model:

  • To generate predictions on the preloaded test data available in the repository, run: bash python3 test.py

  • To generate predictions with your own data or your own command-line arguments:

    1. Place your data under data/. The data should follow the format of the direct RNA-Seq data as in the SG-NEx project.
2. Run:
```bash
python3 test.py --model_path /path/to/model --data_path /path/to/data --n 5 --output_filename your-filename.csv
```

The testing script contains the following command-line arguments:
* --model_path (str): Path to the trained model file. Default is cnn_selected.h5.

* --data_path (str): Path to the test dataset file which must be in json format. Default is /data/test_data.json.

* --n (int): Number of rows of the predictions to print to the console. Default is 10.

* --output_filename (str): File name for the output predictions. This will save the results to the specified file in the /output folder. Default is model_output_`datetime`.csv, where `datetime` is captured from the server where you are running the code. In an Ubuntu instance, it will follow the UTC timezone.

Interpretation of outputs

The predictions will be saved as a csv file (Default name: model_output_datetime.csv) which can be found under the /output folder. It contains the results of modification at each individual position for each transcript. To see the predictions, follow these steps:

  1. Navigate to the output directory to locate the file with the prediction results: bash cd output
  2. List all files in the output directory to identify the latest predictions file: bash ls
  3. Display the predictions in the terminal by specifying the appropriate filename (e.g., modeloutputdatetime.csv): bash cat model_output_datetime.csv # Replace with the correct output file name

The output file will the following three columns:

  • transcript_id: The transcript id of the predicted position
  • transcript_position: The transcript position of the predicted position
  • score: The probability that a given site is modified

Citing

If you use this model in your research, please cite this repository: @misc{Louise_Tan_Xuan_Zhi_and_Khine_Ezali_and_Lim_Shih_Ler_Sean_and_Sitoh_Ying_Ting_Rachel_m6a-detection-project_2024, author = {Louise Tan Xuan Zhi and Khine Ezali and Lim Shih Ler Sean and Sitoh Ying Ting Rachel}, license = {MIT}, month = nov, title = {{m6a-detection-project}}, url = {https://github.com/louisetxz/m6a-detection-project}, version = {1}, year = {2024} } Or cite Louise Tan Xuan Zhi, Khine Ezali, Lim Shih Ler Sean, & Sitoh Ying Ting Rachel. (2024). m6a-detection-project (Version 1) [CNN Model]. https://github.com/louisetxz/m6a-detection-project

License

m6Anet is licensed under the terms of the MIT license.

Owner

  • Login: louisetxz
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.

cff-version: 1.2.0
title: m6a-detection-project
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Louise Tan Xuan Zhi
  - given-names: Khine Ezali
  - given-names: Lim Shih Ler Sean
  - given-names: Sitoh Ying Ting Rachel
repository-code: >-
  https://github.com/louisetxz/m6a-detection-project
abstract: >
  This project aims to address challenges that exist in 
  developing a robust machine-learning classifier by 
  developing a CNN model for identifying m6A modifications 
  in RNA-Seq data, specifically focusing on cell lines found 
  in the SG-NEx Project (2021). By enhancing our ability to 
  detect m6A modifications, we hope to contribute to a deeper
  understanding of identification of m6A and its potential 
  as a target for therapeutic intervention.
license: MIT
version: '1'
date-released: '2024-11-06'

GitHub Events

Total
  • Public event: 1
  • Push event: 22
  • Pull request event: 6
  • Pull request review event: 4
Last Year
  • Public event: 1
  • Push event: 22
  • Pull request event: 6
  • Pull request review event: 4

Dependencies

requirements.txt pypi
  • argparse ==1.4.0
  • imbalanced-learn ==0.10.1
  • keras ==2.13.1
  • matplotlib ==3.6.3
  • numpy ==1.23.5
  • pandas ==1.5.3
  • scikit-learn ==1.1.2
  • scipy ==1.10.1
  • seaborn ==0.12.2
  • tensorflow ==2.13.1
  • xgboost ==1.7.4