machine_learning_scent

Machine_learning for dissertation

https://github.com/nickvusko/machine_learning_scent

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Machine_learning for dissertation

Basic Info

Host: GitHub
Owner: nickvusko
License: mit
Language: Python
Default Branch: main
Size: 60.5 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 3 years ago · Last pushed over 3 years ago

Metadata Files

Readme Citation

Machinelearningscent

To run a script, run main.py

Input format

The script takes in .txt files with tabulator as a separator. First column should contain sample names (each row should represent one sample), second should be called 'Class' and contain classification tags. The rest of the columns should represent variables.

main.py

3 functions has been implemented so far - Nearest Neighbor (NN) - K- and Radius variation, Random Forest, and Principal Component Analysis. By default, all 3 are active, to switch some component off, go to main.py and: change NN = True => to NN = False to switch off NN algorithm change RN = True => to RN = False to switch off RN algorithm change PCA = True => to PCA = False to switch off PCA algorithm To select an input for analysis, fill the name with the txt file to line #40: df = pd.readcsv("NAMEOFTXT", sep="\t", header=0, indexcol=0) For NN and RF, the script trains a model first, and then the model is applied on data. Confusion matrix is displayed as an outcome of the classification. For PCA, script prints out attributes of the model and shows a score graph of first two principal components.

The default train test split is set to 70% for training data, 30% for test data. Xtrain, Xtest, ytrain, ytest = traintestsplit(X, Y, testsize=0.3, randomstate=42)

plotmatrix(y, ypred)

To properly display a confusion matrix, fill names of the labels to index and columns argument of line 14 (they should be identical for most of the cases) example = dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=["vol", "vol2", "vol3"]), or dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=index)

showmatrixplot(x,y)

This is a helper function for quick exploratory analysis. Please note that increasing number of variables increases the size of the matrix plot and computation demands. If it is desired to skip this function, add # at the beginning of line 48 (comment out)

nearestneighborsscent.py

The script contains two classes for each NN variation: GridSearch for finding the optimal model parameters, and Classify, which applies trained model to classification analysis. The input data are normalized.

randomforestscent.py

The script contains two classes for RF : RFGridSearch for finding the optimal model parameters, and RFClassify, which applies trained model to classification analysis. The input data are normalized.

pca_scent.py

The script performs PCA analysis. To edit the legend, fill classes to line 36: ax.legend(handles, ["add", "class", "tags", "here"], title="LEGEND_TITLE") To edit picture title, edit line 39 (or 40 and 41 to change ax labels)

Owner

Name: Niky_LA
Login: nickvusko
Kind: user

Website: www.linkedin.com/in/nikola-ladislavova-729281223
Repositories: 2
Profile: https://github.com/nickvusko

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ladislavová"
  given-names: "Nikola"
  orcid: "https://orcid.org/0000-0001-8733-4780"
title: "ML model generator for human scent data"
version: 1.0
doi: 10.1371/journal.pone.0283259
date-released: 2023-03-22
url: "https://github.com/nickvusko/Machine_learning_scent"

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

StatsModels ==0.13.5
joblib ==1.2.0
matplotlib ==3.6.2
pandas ==1.3.4
scikit-learn ==1.1.3
seaborn ==0.12.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science