machine_learning_scent
Machine_learning for dissertation
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.2%) to scientific vocabulary
Repository
Machine_learning for dissertation
Basic Info
- Host: GitHub
- Owner: nickvusko
- License: mit
- Language: Python
- Default Branch: main
- Size: 60.5 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Machinelearningscent
To run a script, run main.py
Input format
The script takes in .txt files with tabulator as a separator. First column should contain sample names (each row should represent one sample), second should be called 'Class' and contain classification tags. The rest of the columns should represent variables.
main.py
3 functions has been implemented so far - Nearest Neighbor (NN) - K- and Radius variation, Random Forest, and Principal Component Analysis. By default, all 3 are active, to switch some component off, go to main.py and: change NN = True => to NN = False to switch off NN algorithm change RN = True => to RN = False to switch off RN algorithm change PCA = True => to PCA = False to switch off PCA algorithm To select an input for analysis, fill the name with the txt file to line #40: df = pd.readcsv("NAMEOFTXT", sep="\t", header=0, indexcol=0) For NN and RF, the script trains a model first, and then the model is applied on data. Confusion matrix is displayed as an outcome of the classification. For PCA, script prints out attributes of the model and shows a score graph of first two principal components.
The default train test split is set to 70% for training data, 30% for test data. Xtrain, Xtest, ytrain, ytest = traintestsplit(X, Y, testsize=0.3, randomstate=42)
plotmatrix(y, ypred)
To properly display a confusion matrix, fill names of the labels to index and columns argument of line 14 (they should be identical for most of the cases) example = dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=["vol", "vol2", "vol3"]), or dfcm = pd.DataFrame(confusionmatrix(y, ypred), index=["vol", "vol2", "vol3"], columns=index)
showmatrixplot(x,y)
This is a helper function for quick exploratory analysis. Please note that increasing number of variables increases the size of the matrix plot and computation demands. If it is desired to skip this function, add # at the beginning of line 48 (comment out)
nearestneighborsscent.py
The script contains two classes for each NN variation: GridSearch for finding the optimal model parameters, and Classify, which applies trained model to classification analysis. The input data are normalized.
randomforestscent.py
The script contains two classes for RF : RFGridSearch for finding the optimal model parameters, and RFClassify, which applies trained model to classification analysis. The input data are normalized.
pca_scent.py
The script performs PCA analysis. To edit the legend, fill classes to line 36: ax.legend(handles, ["add", "class", "tags", "here"], title="LEGEND_TITLE") To edit picture title, edit line 39 (or 40 and 41 to change ax labels)
Owner
- Name: Niky_LA
- Login: nickvusko
- Kind: user
- Website: www.linkedin.com/in/nikola-ladislavova-729281223
- Repositories: 2
- Profile: https://github.com/nickvusko
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Ladislavová" given-names: "Nikola" orcid: "https://orcid.org/0000-0001-8733-4780" title: "ML model generator for human scent data" version: 1.0 doi: 10.1371/journal.pone.0283259 date-released: 2023-03-22 url: "https://github.com/nickvusko/Machine_learning_scent"
GitHub Events
Total
Last Year
Dependencies
- StatsModels ==0.13.5
- joblib ==1.2.0
- matplotlib ==3.6.2
- pandas ==1.3.4
- scikit-learn ==1.1.3
- seaborn ==0.12.1