parallel-clustering-hybrid

Hybrid Parallel MPI and OpenMP implementations of clustering algorithms. This code was developed for my university thesis.

https://github.com/lefti97/parallel-clustering-hybrid

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: researchgate.net
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Hybrid Parallel MPI and OpenMP implementations of clustering algorithms. This code was developed for my university thesis.

Basic Info
  • Host: GitHub
  • Owner: Lefti97
  • License: mit
  • Language: C
  • Default Branch: main
  • Homepage:
  • Size: 112 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Hybrid Parallel Clustering Algorithms

Title: Development and Evaluation of Parallel Clustering Algorithms in Hybrid Enviroment using OpenMP and MPI

Abstract: The object of this thesis will be the design, development and evaluation, in a parallel environment of shared memory, distributed memory, and hybrid form (massive parallel programming in a combined environment of distributed-shared memory), of efficient algorithms for the problem of data clustering. The development of the algorithms that will be selected will be done in C/C++ language and their evaluation will be done in a suitable real environment. Individual implementations in OpenMP and/or MPI as well as combined implementations will be developed indicatively, such as e.g. using MPI+OpenMP and/or using MPI+MPI Shared Memory, and corresponding comparative measurements and conclusions will be drawn.

  • Thesis in greek: https://polynoe.lib.uniwa.gr/xmlui/handle/11400/8820

For this thesis parallel implementations were made for the clustering algorithms Kmeans and CURE. Further implementations of parallel clustering algorithms may be added in this repo.

Instructions

  • Compile the code in src folder using Makefile make

  • How to execute: ``` ./KmeansSerial <distancethreshold> ./KmeansOpenMP <distancethreshold> mpirun -n ./KmeansMPI <distancethreshold> mpirun -n ./KmeansHybrid <distancethreshold>

                      ./Cure_Serial <filename> <clusters> <representatives> <shrink fraction>
                      ./Cure_OpenMP <filename> <clusters> <representatives> <shrink fraction> <OpenMP threads>
    

    mpirun -n ./CureMPI mpirun -n ./CureHybrid ```

  • Example Runs: ``` ./KmeansSerial.out ../inputs/test1K.txt 3 1 ./KmeansOpenMP.out ../inputs/test1K.txt 3 1 4 mpirun -n 4 ./KmeansMPI.out ../inputs/test1K.txt 3 1 mpirun -n 4 ./KmeansHybrid.out ../inputs/test1K.txt 3 1 4

        ./Cure_Serial.out ../inputs/test1K.txt 3 5 0.4
        ./Cure_OpenMP.out ../inputs/test1K.txt 3 5 0.4 4
    

    mpirun -n 4 ./CureMPI.out ../inputs/test1K.txt 3 5 0.4 mpirun -n 4 ./CureHybrid.out ../inputs/test1K.txt 3 5 0.4 4 ```

  • If gnuplot is available, a scatter plot will be saved in src/output folder.

  • A txt file with the terminal output will be saved in src/output folder.

Resources

  • Hadjidoukas, Panagiotis & Amsaleg, Laurent. (2006). Parallelization of a Hierarchical Data Clustering Algorithm Using OpenMP. 4315. 289-299. 10.1007/978-3-540-68555-5_24. Link
  • Zhang, Jing & Wu, Gongqing & Xuegang, Hu & Li, Shiying & Hao, Shuilong. (2011). A Parallel K-Means Clustering Algorithm with MPI. 10.1109/PAAP.2011.17. Link

Owner

  • Name: Lefti
  • Login: Lefti97
  • Kind: user
  • Location: Athens, Greece
  • Company: University of West Attica

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Vangelis"
  given-names: "Lefteris"
title: "parallel-clustering-hybrid"
version: 1.0.0
date-released: 2025-03-08
url: "https://github.com/Lefti97/parallel-clustering-hybrid"

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Public event: 1
Last Year
  • Watch event: 1
  • Push event: 2
  • Public event: 1