genhunt
GenHunt: an open-source MATLAB-based application for Huntington's disease screening
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
GenHunt: an open-source MATLAB-based application for Huntington's disease screening
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
GenHunt: A-MATLAB-Application-for-Efficient-Hungtingtons-Disease-Screening
Introduction
Huntingtons Disease (HD) is a neurodegenerative disorder caused by the dysfunction of the basal ganglia in the human brain. It involves the HTT gene encoding for the Huntingtin protein. Patients who have more than 40 consecutive CAG repeats on the HTT gene typically develop HD in their lives while those who have less than 35 repeats do not [1]. Meanwhile, individuals who have 36 to 39 repeats may or may not develop HD throughout their lives [1].
To aid in the early detection of the disease by genetic analysis, we provide a comprehensive MATLAB application (mainscript) capable of screening HD that encompasses three functions: 1. Screen Multiple Sequences & Create Database 2. Append New Profile 3. Visualize Individual Data
We also developed two scripts: randomCAG to randomly create large genetic datasets for HD due to the lack of an accessible data source (based on the Reference gene obtained from Ensembl https://asia.ensembl.org/Homosapiens/Transcript/Summary?g=ENSG00000197386;r=4:3074681-3243957;t=ENST00000355072) and _timeanalysis_ to analyze the time consumed by the program for large-scale data screening (Screen Multiple Sequences & Create Database function). By providing the HTT gene analysis, our program aims to automate and reduce the cost of the HD screening service.
mainscript
The mainscript has three functions and their usage is described.
Screen Multiple Sequences & Create Database
To screen for HD in a large set of genetic sequences and record data in a database, you can choose the Screen Multiple Sequences and Create Database function (Fig a). When running mainscript in MATLAB, select the function, then the program requires you to select a folder (Fig b) of genetic sequences saved as text files. The program then asks you to name the database. Subsequently, the program will analyze and save the results in an Excel database accordingly, as shown in Fig c. Note that the genetic sequences used in this project were generated from randomCAG script as described above.
The Excel database has five columns, representing five outputs/results for each genetic sequence: 1. ID 2. Diagnosis 3. CAG repeats 4. Cut sequence 5. Raw data
The program can detect non-gene files and save the ID as 'error' and Diagnosis as 'File error'.
In addition, each genetic sequence will be visualized in a comprehensive figure that will be saved in the same folder.
The Reference gene was used as an example for visualization. It was found to have a maximum of 19 consecutive CAG repeats (bar graph, top left) located at the beginning of the HTT gene (sequence heatmap, bottom). Note that there are around 150 CAG repeats in total but only a maximum of 19 consecutive ones (codon frequency heatmap, top right).
Other sequences will be visualized in the same graphs, but the number of codons and their positions will vary.
NOTE: 1. Genetic sequences have to be saved in text file format. 2. Due to the large number of data, figures will not be displayed in the Screen Multiple Sequences & Create Database function.
Append New Profile
Next, you can use the function 'Append New Profile' to append a single new profile to the created database. After selecting the function, the program then asks you to input the new ID. Next, you can either: * Paste a sequence * Import a text file
Subsequently, a new visualizing figure will also be created, displayed, and saved.
Visualize Individual Data
Finally, the program offers the Visualize Individual Data function to analyze a specific profile within an existing database. After choosing the Visualize Individual Data function, you can select a database, write the ID of the desired sequence, and the visualization graphs will be displayed.
randomCAG
Due to a lack of a public genetic database for HD, we created randomCAG script to generate 10,000 sequences for efficiency assessment using the raw Reference gene as the template (obtained from Ensembl https://asia.ensembl.org/Homo_sapiens/Transcript/Summary?g=ENSG00000197386;r=4:3074681-3243957;t=ENST00000355072). The position and length of CAG repeats were determined randomly within appropriate ranges.
timeanalysis_
To assess the efficiency of large-scale data screening, we created timeanalysis_ to screen folders of 1,000 to 10,000 genetic sequences, with 1,000 file intervals. The script consists only of the codes from the Screen Multiple Sequences & Create Database function and additional codes to measure processing time.
The average processing time was 2.2 seconds/file. The modest increment in processing time demonstrates the program's consistent performance in handling large volumes without substantial slowdowns.
The program run and measurement were performed on an x64-based PC (Model: SYS-5039A-I), Intel Xeon W-2255 CPU @ 3.70GHz.
References
[1] Huntington disease. MedlinePlus Genetics. https://medlineplus.gov/genetics/condition/ huntington-disease/, last accessed 2023/07/08.
Owner
- Name: Thy Ta
- Login: trangthyy
- Kind: user
- Repositories: 1
- Profile: https://github.com/trangthyy
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you want to use this application, please cite as below." authors: - family-names: "Ta" given-names: "Thy Hoang Trang" - family-names: "Dao" given-names: "Trang Thu" title: "A MATLAB application for efficient Huntington's disease screening" version: 1.0 date-released: 2023-08-18 url: "https://github.com/trangthyy/A-MATLAB-application-for-Efficient-Huntingtons-Disease-Screening"