https://github.com/acg-team/mutation-tme-crc

Machine learning analysis of genetic mutations (STRs, SNPs, indels) and tumor microenvironment (TME) features in colorectal cancer using TCGA data.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Machine learning analysis of genetic mutations (STRs, SNPs, indels) and tumor microenvironment (TME) features in colorectal cancer using TCGA data.

Basic Info

Host: GitHub
Owner: acg-team
Default Branch: main
Size: 1.82 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme

Machine learning of genotype-phenotype associations in colorectal cancer tumors from mutation

Introduction

This project explores how genetic mutations relate to the tumor microenvironment (TME) in colorectal cancer. We focus on three types of mutations: - short tandem repeats (STRs) – repeated sequences in dna. - single nucleotide polymorphisms (SNPs) – single base-pair changes in the dna. - insertions and deletions (indels) – small additions or removals of DNA bases.

By studying these mutation types, we aim to understand their connection to mucin production and immune cell presence in tumors.

Using machine learning, we need structured ways to represent mutations. Here is two approaches: 1. Mutation counting: for each sample, count the number of specific mutation types (number of SNPs, number of indels) and use these counts as features.

Dimensionality reduction: since mutation data is high-dimensional, apply methods like principal component analysis (PCA) to reduce feature numbers. This will be done separately for each mutation type before combining them in the ML model.

Steps

Literature overview. Review existing studies on genetic mutations and their effect on the tumor microenvironment, focusing on mucin production and immune cell composition in colorectal cancer.
Data preparation. Extract mutation data from TCGA and preprocess it for machine learning. Determine the best way to represent different mutation types for predictive modeling.
Machine learning. Train models to predict mucin levels and immune cell presence based on mutation data. The goal is to find which mutation types are most important for understanding the tumor microenvironment.

3.1 Mutation representation - use mutation counting and dimensionality reduction as described above. - normalize and preprocess features.

3.2 Model selection - since the dataset has 300-500 samples, we will use models that work well with small datasets: - random forest - support vector machines (SVM) – useful for small datasets too - gradient boosting

3.3 Cross-validation and model evaluation - use k-fold cross-validation (e.g., 5-fold) for reliable evaluation. - apply metrics like roc-auc, accuracy, and f1-score for classification, and rmse or r² for regression. - analyze feature importance to interpret model predictions.

Interpretation. Examine model results to find key genetic factors affecting mucin and immune cell levels. Compare predictions with existing research for validation.

Potential analysis

Understanding correlations between gene mutations and mucin expression may help reveal how mutations influence mucin production.

Owner

Name: Applied Computational Genomics Team
Login: acg-team
Kind: organization
Location: Wädenswil, Switzerland

Website: https://www.zhaw.ch/de/lsfm/institute-zentren/ias/forschung/computational-genomics/
Repositories: 29
Profile: https://github.com/acg-team

Computational Genomics tools from Maria Anisimova and collaborators

GitHub Events

Total

Push event: 4

Last Year

Push event: 4

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/acg-team/mutation-tme-crc

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Machine learning of genotype-phenotype associations in colorectal cancer tumors from mutation

Introduction

Steps

Potential analysis

Owner

GitHub Events

Total

Last Year