machine_learning_scent

Machine_learning for dissertation

https://github.com/nickvusko/machine_learning_scent

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Machine_learning for dissertation

Basic Info
  • Host: GitHub
  • Owner: nickvusko
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 60.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme Citation

README.md

Machinelearningscent

To run a script, run main.py

Input format

The script takes in .txt files with tabulator as a separator. First column should contain sample names (each row should represent one sample), second should be called 'Class' and contain classification tags. The rest of the columns should represent variables.

main.py

3 functions has been implemented so far - Nearest Neighbor (NN) - K- and Radius variation, Random Forest, and Principal Component Analysis. By default, all 3 are active, to switch some component off, go to main.py and: change NN = True => to NN = False to switch off NN algorithm change RN = True => to RN = False to switch off RN algorithm change PCA = True => to PCA = False to switch off PCA algorithm To select an input for analysis, fill the name with the txt file to line #40: df = pd.readcsv("NAMEOFTXT", sep="\t", header=0, indexcol=0) For NN and RF, the script trains a model first, and then the model is applied on data. Confusion matrix is displayed as an outcome of the classification. For PCA, script prints out attributes of the model and shows a score graph of first two principal components.

The default train test split is set to 70% for training data, 30% for test data. Xtrain, Xtest, ytrain, ytest = traintestsplit(X, Y, testsize=0.3, randomstate=42)

plotmatrix(y, ypred)

To properly display a confusion matrix, fill names of the labels to index and columns argument of line 14 (they should be identical for most of the cases) example = dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=["vol", "vol2", "vol3"]), or dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=index)

showmatrixplot(x,y)

This is a helper function for quick exploratory analysis. Please note that increasing number of variables increases the size of the matrix plot and computation demands. If it is desired to skip this function, add # at the beginning of line 48 (comment out)

nearestneighborsscent.py

The script contains two classes for each NN variation: GridSearch for finding the optimal model parameters, and Classify, which applies trained model to classification analysis. The input data are normalized.

randomforestscent.py

The script contains two classes for RF : RFGridSearch for finding the optimal model parameters, and RFClassify, which applies trained model to classification analysis. The input data are normalized.

pca_scent.py

The script performs PCA analysis. To edit the legend, fill classes to line 36: ax.legend(handles, ["add", "class", "tags", "here"], title="LEGEND_TITLE") To edit picture title, edit line 39 (or 40 and 41 to change ax labels)

Owner

  • Name: Niky_LA
  • Login: nickvusko
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ladislavová"
  given-names: "Nikola"
  orcid: "https://orcid.org/0000-0001-8733-4780"
title: "ML model generator for human scent data"
version: 1.0
doi: 10.1371/journal.pone.0283259
date-released: 2023-03-22
url: "https://github.com/nickvusko/Machine_learning_scent"

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • StatsModels ==0.13.5
  • joblib ==1.2.0
  • matplotlib ==3.6.2
  • pandas ==1.3.4
  • scikit-learn ==1.1.3
  • seaborn ==0.12.1