m6a-detection-project
Detection of m6A from direct RNA-Seq data
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Detection of m6A from direct RNA-Seq data
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
m6a-detection-project
Table of Contents
Overview
This project aims to address challenges that exist in developing a robust machine-learning classifier by developing a CNN model for identifying m6A modifications in RNA-Seq data, specifically focusing on cell lines found in the SG-NEx Project (2021). By enhancing our ability to detect m6A modifications, we hope to contribute to a deeper understanding of identification of m6A and its potential as a target for therapeutic intervention.
Repository Structure
This section provides an overview of the main folders and files in this repository.
bash
m6a-detection-project/
├── data/ # Folder containing RNA-Seq data files
│ └── test_data.json # Test data used for running model tests
├── output/ # Folder where model prediction files are saved
│ └── model_output_datetime.csv # Example output file with prediction results
│
├── test.py # Script for evaluating the model on test data
├── utils.py # Script containing all relevant functions used in the testing script
├── train.ipynb # Notebook for training and tuning the model
├── final_task2.ipynb # Notebook for testing model to predict m6A sites in all SG-NEx direct RNA-Seq data sets
├── cnn_selected.h5 # Final trained model file
│
├── CITATION.cff # Citation for this repository
├── LICENSE # License for this repository
├── requirements.txt # List of required packages and dependencies
├── README.md # Project overview and setup instructions (this file)
└── .gitignore # Specifies files and folders to ignore in version control
Quick Start Guide
Ubuntu setup
- Start an Ubuntu instance from ResearchGateway. A recommended and sufficient Ubuntu instance is: t3.medium.
- Access your Ubuntu instance.
Cloning the repository
- Clone the repository. To clone our repository using HTTPS, run
bash git clone https://github.com/louisetxz/m6a-detection-project.git - Change directory into our project.
cd m6a-detection-project
Installing dependencies
To update package list, run
bash
sudo apt-get update
To install package manager PIP, run
bash
sudo apt install python3-pip
To install dependencies, run
bash
pip install -r requirements.txt
Note: These dependencies are required only for the testing file, test.py.
Usage
There are two ways you can generate predictions using our model:
To generate predictions on the preloaded test data available in the repository, run:
bash python3 test.pyTo generate predictions with your own data or your own command-line arguments:
- Place your data under data/. The data should follow the format of the direct RNA-Seq data as in the SG-NEx project.
2. Run:
```bash
python3 test.py --model_path /path/to/model --data_path /path/to/data --n 5 --output_filename your-filename.csv
```
The testing script contains the following command-line arguments:
* --model_path (str): Path to the trained model file. Default is cnn_selected.h5.
* --data_path (str): Path to the test dataset file which must be in json format. Default is /data/test_data.json.
* --n (int): Number of rows of the predictions to print to the console. Default is 10.
* --output_filename (str): File name for the output predictions. This will save the results to the specified file in the /output folder. Default is model_output_`datetime`.csv, where `datetime` is captured from the server where you are running the code. In an Ubuntu instance, it will follow the UTC timezone.
Interpretation of outputs
The predictions will be saved as a csv file (Default name: model_output_datetime.csv) which can be found under the /output folder. It contains the results of modification at each individual position for each transcript. To see the predictions, follow these steps:
- Navigate to the output directory to locate the file with the prediction results:
bash cd output - List all files in the output directory to identify the latest predictions file:
bash ls - Display the predictions in the terminal by specifying the appropriate filename (e.g., modeloutputdatetime.csv):
bash cat model_output_datetime.csv # Replace with the correct output file name
The output file will the following three columns:
transcript_id: The transcript id of the predicted positiontranscript_position: The transcript position of the predicted positionscore: The probability that a given site is modified
Citing
If you use this model in your research, please cite this repository:
@misc{Louise_Tan_Xuan_Zhi_and_Khine_Ezali_and_Lim_Shih_Ler_Sean_and_Sitoh_Ying_Ting_Rachel_m6a-detection-project_2024,
author = {Louise Tan Xuan Zhi and Khine Ezali and Lim Shih Ler Sean and Sitoh Ying Ting Rachel},
license = {MIT},
month = nov,
title = {{m6a-detection-project}},
url = {https://github.com/louisetxz/m6a-detection-project},
version = {1},
year = {2024}
}
Or cite Louise Tan Xuan Zhi, Khine Ezali, Lim Shih Ler Sean, & Sitoh Ying Ting Rachel. (2024). m6a-detection-project (Version 1) [CNN Model]. https://github.com/louisetxz/m6a-detection-project
License
m6Anet is licensed under the terms of the MIT license.
Owner
- Login: louisetxz
- Kind: user
- Repositories: 1
- Profile: https://github.com/louisetxz
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit. cff-version: 1.2.0 title: m6a-detection-project message: >- If you use this software, please cite it using the metadata from this file. type: software authors: - given-names: Louise Tan Xuan Zhi - given-names: Khine Ezali - given-names: Lim Shih Ler Sean - given-names: Sitoh Ying Ting Rachel repository-code: >- https://github.com/louisetxz/m6a-detection-project abstract: > This project aims to address challenges that exist in developing a robust machine-learning classifier by developing a CNN model for identifying m6A modifications in RNA-Seq data, specifically focusing on cell lines found in the SG-NEx Project (2021). By enhancing our ability to detect m6A modifications, we hope to contribute to a deeper understanding of identification of m6A and its potential as a target for therapeutic intervention. license: MIT version: '1' date-released: '2024-11-06'
GitHub Events
Total
- Public event: 1
- Push event: 22
- Pull request event: 6
- Pull request review event: 4
Last Year
- Public event: 1
- Push event: 22
- Pull request event: 6
- Pull request review event: 4
Dependencies
- argparse ==1.4.0
- imbalanced-learn ==0.10.1
- keras ==2.13.1
- matplotlib ==3.6.3
- numpy ==1.23.5
- pandas ==1.5.3
- scikit-learn ==1.1.2
- scipy ==1.10.1
- seaborn ==0.12.2
- tensorflow ==2.13.1
- xgboost ==1.7.4