cluster-analysis-in-biocolour-project
we apply common unsupervised learning methods to discover hidden clusters emerging among bio-dyed textile samples, and show the potential of clustering techniques in this application domain.
https://github.com/4dajkong/cluster-analysis-in-biocolour-project
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
we apply common unsupervised learning methods to discover hidden clusters emerging among bio-dyed textile samples, and show the potential of clustering techniques in this application domain.
Basic Info
- Host: GitHub
- Owner: 4daJKong
- Language: Python
- Default Branch: main
- Size: 432 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ClusteranalysisinBioColourproject
Introduction
Natural compounds such as biological colorants (biocolorants) have long been employed for the purpose of dying textile and represent a crucial ingredient in the mass market textile industry. However, industry-wide standards for commercialization of biocolorants are still lacking. Thus, it is beneficial to establish a database including compositionally diverse biocolorants. Moreover, it could be used as a tool to identify and authenticate biocolorant in textile end-products, to ensure their quality and safety, thereby supporting the growth of the biocolorant industry. Efficiently managing the databases and analyzing bio-dyed products data typically requires experts to organize and refine data collected from bio-based dye and pigment production. In this process, it not only requires researchers to have a certain understanding of botanical taxonomy but also knowledge about biology and chemistry.
As one part of the BioColour consortium project, our goal in this research is to take advantage of unsupervised learning for cluster analysis, to discover possible clusters of bio-dyed textile in the absence of ground truth labels or other knowledge of expert domain. This work aims to apply different approaches for unsupervised learning. Specifically, we use agglomerative clustering, Fuzzy C-means, OPTICS as well as a well-known artificial neural network (ANN), namely self-organizing maps (SOM), resulting in an investigation that combines data visualization and cluster analysis. In summary, we apply AI techniques to discover hidden clusters emerging among products colored using biocolorant, here specifically bio-dyed textile samples, and show the potential of clustering techniques in this application domain. (2020.1-2021.8)
Thesis Link: https://erepo.uef.fi/handle/123456789/26048
Requirements:
| Software | Version | | ------------- | ------------- | | Python | 3.7.0 | | Numpy | 1.17.2 | | pandas | 1.0.4 | | matplotlib | 3.2.1 | | scikit-learn | 0.21.3 | | colormath | 3.0.0 |
Citation:
In particular, we use its implementation of the evaluation measures:
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.daviesbouldinscore.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html * as well as for the agglomerative hierarchical clustering algorithm:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html * and the OPTICS algorithm:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html
For the other algorithms, we use separate existing implementations, respectively * for the Fuzzy C-means algorithms: Madson Luiz Dantas Dias. (2019). fuzzy-c-means: An implementation of Fuzzy C-means clustering algorithm., Zenodo, doi = 10.5281/zenodo.3066222
https://github.com/omadson/fuzzy-c-means * for the two-dimensional Self-Organizing Maps: Giuseppe Vettigli. (2018). Mini-Som: minimalistic and NumPy-based implementation of the Self Organizing Map.
https://github.com/JustGlowing/minisom * for the Growing Hierarchical Self-Organizing Map: Civitelli E., Teotini F. (2018). An implementation of Growing Hierarchical SOM algorithm.
https://github.com/enry12/growinghierarchicalsom
Some results
The distribution of bio-dyed samples in 2D space after PCA and corresponding clustering results by different unsupervised learning methods.

One example in cluster analysis by 2D SOM in size of 140 neurons:

One example in cluster analysis by GHSOM in size of 20 neurons:

Owner
- Name: ZY.Li
- Login: 4daJKong
- Kind: user
- Repositories: 1
- Profile: https://github.com/4daJKong
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "LI" given-names: "ZONGYUE" title: "Cluster-analysis-in-BioColour-project" version: 1.0.0 date-released: 2021-08-15 url: "https://github.com/4daJKong/Cluster-analysis-in-BioColour-project"