gempba

Generic Massive Parallelisation of Branching Algorithms

https://github.com/rapastranac/gempba

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Generic Massive Parallelisation of Branching Algorithms

Basic Info
  • Host: GitHub
  • Owner: rapastranac
  • License: mit
  • Language: C++
  • Default Branch: main
  • Size: 33.5 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 4
  • Releases: 5
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Generic Massive Parallelisation of Branching Algorithms

C/C++ CI (Ubuntu 24.04) C/C++ CI (Windows 2022) GitHub License GitHub Release

Requirements

  • C++20 compatible compiler
  • CMake 3.28
  • OpenMPI 4.0
  • Boost libraries (optional, only for examples and tests)
  • GoogleTest if you want to run and compile the tests (Not fetched by default)

Testing

  • GoogleTest framework for unit testing ## Dependency Management

This project uses CPM.cmake for managing external dependencies. CPM is a CMake script-based dependency manager no installation required. Some of the dependencies are used only for testing and examples, so they are not included in the main build.

List of dependencies: - spdlog - argparse

Platforms

  • Linux

Installation:

How to include the GemPBA library in your project

To include the GemPBA library in your project, you can either clone the repository or download it as a zip file. After that, you can include the necessary headers in your source files.

However, it is recommended to use CPM to manage the GemPBA library in your project. This way, you can easily keep track of updates and dependencies. If you don't have an ${WORKSPACE}/external folder in your project, you can create one and add the following external/CMakeLists.txt file to it:

```cmake cmakeminimumrequired(VERSION 3.28) include(FetchContent)

project(external)

CPM

set(CPMDOWNLOADLOCATION "${CMAKEBINARYDIR}/cmake/CPM.cmake") if (NOT (EXISTS ${CPMDOWNLOADLOCATION})) message(STATUS "Downloading CPM.cmake") file(DOWNLOAD https://github.com/cpm-cmake/CPM.cmake/releases/latest/download/CPM.cmake ${CPMDOWNLOADLOCATION}) endif () include(${CPMDOWNLOADLOCATION})

CPMAddPackage( NAME gempba GITHUBREPOSITORY rapastranac/gempba GITTAG main )

add_library(external INTERFACE)

targetlinklibraries(external INTERFACE gempba) ```

Then, in your main CMakeLists.txt file, you can include the external folder like this:

```cmake cmakeminimumrequired(VERSION 3.28)

project(your-project VERSION 1.0 LANGUAGES CXX) set(CMAKECXXSTANDARD 23)

GemPBA flags and includes

set(GEMPBAMULTIPROCESSING ON CACHE BOOL "" FORCE) # (Needed if you want to enable multiprocessing) set(GEMPBADEBUGCOMMENTS ON CACHE BOOL "" FORCE) # (Optional) Enable debug comments in the code set(GEMPBABUILDEXAMPLES ON CACHE BOOL "" FORCE) # (Optional) Enable building examples set(GEMPBABUILDTESTS ON CACHE BOOL "" FORCE) # (Optional) Enable building tests addsubdirectory(external)

Other configurations of your project

...

Main executable

targetcompiledefinitions(main PRIVATE)

link GemPBA with your main executable

targetlinklibraries(main PUBLIC gempba::gempba )

```

For further testing and minimal reproducible examples, please refer to GemPBA tester repository.

Description

This tool will help you parallelize almost any branching algorithm that seemed initially impossible or super complex to do. Please refer to this MSc. Thesis and Paper, for a performance report.

GemPBA will allow you to perform parallelism using a multithreading or multiprocessing environment. It also contains a robust Dynamic Load Balancing (DLB) that significantly decreases the CPU idle time, which also increases performance due to the reduction of parallel calls in branching algorithms.

GemPBA has four main modules that help you easily understand its applications.

  • Branch Handler:

    This module is in charge of handling tasks among the processors, whether multithreading or multiprocessing is being used. It manages a thread pool, for which the user can allocate the number of threads he needs, yet it is recommended to allocate them according to the number of physical cores in your machine.

    It essentially finds the first available processor in the thread pool, or in a remote participating machine.

  • Result Holder:

    In order to keep track of the arguments such that they can properly be managed by the library. The function signature must be slightly modified to include two additional parameters, one as the first, and other as the last one. This modification will be explained later.

  • Dynamic Load Balancing Handler:

    Since recursive functions create a stack, the DLB instance has access to all the generated tasks at each level of it by the means of the ResultHolder instances. Thus, when creating an instance of the ResultHolder, the Dynamic Load Balancer must be passed in the constructor.

  • MPI Scheduler:

    This additional module is in charge of establishing inter-process communication using a lightweight semi-centralized topology. If multiprocessing parallelism is of the user's interest, then this module must be used and passed to the Branch Handler just after its initialization,



Concepts

The following concepts are essential to understand how to use the GemPBA library.

  • gempba::goal: This is an enumeration that defines the goal of the algorithm, whether it is a minimization or maximization problem. The default value is maximization.
  • gempba::score: This is a class that wraps a primitive type, which is used to compare the quality of the solutions found so far. It can be an integer, float, double, etc. The user must define the type of score he wants to use by passing it to the branch_handler using the method set_goal(gempba::goal, gempba::score_type).
  • gempba::score_type: This is an enumeration that defines the primitive type that will be used as a score. The available types are: I32, I64, F32, F64, F128. The default value is I32.

Most branching algorithm aim to find the best solution according to a certain criteria, which is usually a minimization or maximization problem. The GemPBA library is designed to handle both types of problems, and it is up to the user to define the goal of the algorithm. Whether we are minimizing or maximizing, the library will handle the comparison of the numerical values found so far, and it will store the best solution according to the user's criteria.

The score (formerly known as reference_value) is anything that can be used to represent the quality of the solution found so far. It can be the size of the solution, the cost of it, or any other metric that the user wants to use. As per the examples in this project, the size of the solution is used as the score to compare the solutions found so far, which is an integer value. If the user wants to use a different metric, he can do so by defining the type of score he wants to use.

There is no need to initialize the score value nor the goal if this is a maximization problem and the score wraps an integer. However, if the user wants to use a different type of score or if the problem is a minimization one, then he must set these values using the method branch_handler::set_goal(gempba::goal, gempba::score_type). It is the user's responsibility to set the score value if it is not the default one and be consistent when updating it, otherwise unexpected behaviour might occur.

Multithreading

This is the easiest environment to set up, which is in turn the fastest to implement. In general the user must modify only the main functions to be parallelized. If the algorithm uses more than a single function recursion, this can also be parallelized.

Consider the following function with three branches per recursion. Essentially, when finding a solution, every algorithm can be reduced to a minimization or maximization problem. For the sake of the following example, let's assume that it is a minimization problem.

```cpp void foo(MyClass instance, float f, double d){ /* local solution might be the size of any input argument*/ if (localSolution < bestValueSoFar){ bestValueSoFar = localSolution; return; }

/* instance, f and d are used to create sub instances for the
    recursions, for left, middle and right branches.
*/

foo(instance_l, f_l, d_l);
foo(instance_m, f_m, d_m);
foo(instance_r, f_r, d_r);
return;

} ```

In order to parallelize the previous code, the function signature should change like this.

cpp void foo(int id, MyClass instance, float f, double d, void *parent = nullptr);


Where id stands for thread ID and parent is designed to use the Novel Dynamic Load Balancing.

These additional arguments are to be used by the library only, yet the user could also use them to track like threads utilization and other scenarios that it might find applicable.

Thus, the parallelized version of the code will be like as follows.

```cpp std::mutex mtx; auto &dlb = gempba::DLBHandler::getInstance(); auto &branchHandler = gempba::branchhandler::get_instance(); using HolderType = gempba::ResultHolder;

void foo(int id, MyClass instance, float f, double d, void *parent = nullptr)

// local solution might be the size of any input argument

if (localSolution < branchHandler.get_score().get<int>()){
    /* the following mutex ensures that one best solution so far is
    store at a time */
    std::scoped_lock<std::mutex> lock(mtx); 

    /*  if the condition is met, then the solution can be stored in
    the library, which can be retrieved at the end of the
    execution, if the solution is not required to be stored, then
    there is no need to invoke this method nor use the mutex */   

    // We wrap the size of the solution in a gempba::score
    gempba::score score = gempba::score::make(localSolution);

    /* Attempt to hold the solution in the `branch_handler`, update might fail
    * because another thread might have found a better solution in the meantime.
    */  
    branchHandler.try_update_result(localSolution, score);

    return;
}

/* intance, f and d are used to create sub instances for the
    recursions, for left, middle and right branches.
*/


HolderType *dummyParent = nullptr;   // in the case parent is nullptr

/* The dynamic load balancing uses tracks the search tree using these
temporary arguments holders.*/

HolderType result_holder_left(dlb, id, parent);
HolderType result_holder_middle(dlb, id, parent);
HolderType result_holder_right(dlb, id, parent);

/*  if parent is nullptr, then a virtual root is should be created
such that branches within this scope can be accessed from below */
if (branchHandler.get_load_balancing_strategy() == gempba::QUASI_HORIZONTAL) {
    dummyParent = new HolderType(dlb, id);
    dlb.linkVirtualRoot(id, dummyParent, result_holder_left, result_holder_middle, result_holder_right);
}

/* arguments for each branch should be constructed before making any
branch call since, there is no guarantee of parallelizing each branch 
on this recursion level*/
result_holder_left.holdArgs(instance_l, f_l, d_l);
result_holder_middle.holdArgs(instance_m, f_m, d_m);
result_holder_right.holdArgs(instance_r, f_r, d_r);


/*  The try_push_mt<>() method is aynchronous as long as an available
processor is found, otherwise, it explore branch in a senquential fashion*/
branchHandler.try_push_mt<void>(foo, id, result_holder_left);
branchHandler.try_push_mt<void>(foo, id, result_holder_middle);


/*  it makes no sense to call asynchronously the last branch, since it
can be safely be executed sequentially, yet, if down the search tree,the owner thread of this search domain finds an available processor, 
then this branch can be sent to another processor.*/
branchHandler.forward<void>(foo, id, result_holder_right);


// if virtual root allocated, memory should be freed
if (dummyParent)
        delete dummyParent;

return;

} ```



As seen above, the parallelization of this algorithm is straightforward once the library is well managed. It is worth highlighting that input arguments must be ready before any branch calling, since down this branch, the DLB might try to call a sibling branch at this level, and it will receive only empty data. This might introduce some unnecessary memory utilization. Also, instantiating an input parameter will lead to passing outdated arguments to the functions that may be discarded just after invoking the function. This can be minimized by other available techniques discussed later.

Let's imagine that there are three functions foo1, foo2, foo3, and they call each other recursively. Like the following snippet.

```cpp void foo1(MyClass instance, float f, double d){ /* local solution might be the size of any input argument*/ if (localSolution < bestValueSoFar){ bestValueSoFar = localSolution; return; }

/* intance, f and d are used to create sub instances for the
    recursions, for left, middle and right branches.
*/

foo1(instance_l, f_l, d_l);
foo2(instance_m, f_m, d_m);
foo3(instance_r, f_r, d_r);
return;

} ```

Parallelizing the program would not be any different than the version presented above. We should only pay attention to match the proper arguments to the corresponding function. Which just modifies the parallel version, excluding the comment. It will result as follows.

```cpp std::mutex mtx; auto &dlb = gempba::DLBHandler::getInstance(); auto &branchHandler = gempba::branchhandler::get_instance(); using HolderType = gempba::ResultHolder;

void foo1(int id, MyClass instance, float f, double d, void *parent = nullptr)

if (localSolution < branchHandler.get_score().get<int>()){
    std::scoped_lock<std::mutex> lock(mtx);

    auto score = gempba::score::make(localSolution.size());
    branch_handler.try_update_result(localSolution, score);
    return;
}


HolderType *dummyParent = nullptr;
HolderType result_holder_left(dlb, id, parent);
HolderType result_holder_middle(dlb, id, parent);
HolderType result_holder_right(dlb, id, parent);

if (branchHandler.get_load_balancing_strategy() == gempba::QUASI_HORIZONTAL) {
    dummyParent = new HolderType(dlb, id);
    dlb.linkVirtualRoot(id, dummyParent, result_holder_left, result_holder_middle, result_holder_right);
}

result_holder_left.holdArgs(instance_l, f_l, d_l);
result_holder_middle.holdArgs(instance_m, f_m, d_m);
result_holder_right.holdArgs(instance_r, f_r, d_r);

branchHandler.try_push_mt<void>(foo1, id, result_holder_left);
branchHandler.try_push_mt<void>(foo2, id, result_holder_middle);
branchHandler.forward<void>(foo3, id, result_holder_right);

if (dummyParent)
        delete dummyParent;

return;

} ```

If there is no interest in parallelizing a branch, it can simply be invoked as its sequential fashion, however the two new arguments must be considered. For instance, the last branch.

foo(id, instance_r, f_r, d_r, nullptr)

If this branch is to be run sequentially, then no instance of gempba::ResultHolder should be created for it.

Most of the time, the code of a branching algorithm is optimized to check if the branch is worth it to explore. What usually happens is that the instances to be passed are compared somehow against the best solution so far, and therefore it is possible to conclude that a branch is leading to a better or worse solution.

Then, an optimized version of our sequential algorithm would be as follows.

```cpp void foo(MyClass instance, float f, double d){ /* local solution might be the size of any input argument*/ if (localSolution < bestValueSoFar){ bestValueSoFar = localSolution; return; }

/* instance, f and d are used to create sub instances for the
    recursions, for left, middle and right branches.
*/
if( /* left branch leads to a better solution */)
    foo(instance_l, f_l, d_l);
if( /* middle branch leads to a better solution */)
    foo(instance_m, f_m, d_m);
if( /* right branch leads to a better solution */)
    foo(instance_r, f_r, d_r);

return;

} ```

The branch checking in the above code does not change in its parallel version.

GemPBA has a method to avoid instantiating input parameters for each branch, which is the bind_branch_checkIn method. This method guarantees to instantiate the input arguments just before using them, thus guaranteeing to use the most up-to-date values. This avoids sending useless data to processors just to be discarded by the algorithm in the first few lines.

Let's optimize our reference parallel code.

```cpp std::mutex mtx; auto &dlb = gempba::DLBHandler::getInstance(); auto &branchHandler = gempba::branchhandler::get_instance(); using HolderType = gempba::ResultHolder;

void foo(int id, MyClass instance, float f, double d, void *parent = nullptr)

if (localSolution < branchHandler.get_score().get<int>()){
    std::scoped_lock<std::mutex> lock(mtx); 
    auto score = gempba::score::make(localSolution.size());
    branchHandler.try_update_result(localSolution, score);

    return;
}

HolderType *dummyParent = nullptr;
HolderType result_holder_left(dlb, id, parent);
HolderType result_holder_middle(dlb, id, parent);
HolderType result_holder_right(dlb, id, parent);

if (branchHandler.get_load_balancing_strategy() == gempba::QUASI_HORIZONTAL){
    dummyParent = new HolderType(dlb, id);
    dlb.linkVirtualRoot(id, dummyParent, result_holder_left, result_holder_middle, result_holder_right);
}


result_holder_left.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_l, f_l, d_l */
                                  if (/* left branch leads to a better solution */){
                                      result_holder_left.holdArgs(instance_l, f_l, d_l);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

result_holder_middle.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_m, f_m, d_m */
                                  if (/* middle branch leads to a better solution */){
                                      result_holder_middle.holdArgs(instance_m, f_m, d_m);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

result_holder_right.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_r, f_r, d_r */
                                  if (/* right branch leads to a better solution */){
                                      result_holder_right.holdArgs(instance_r, f_r, d_r);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

if (result_holder_left.evaluate_branch_checkIn()){
    branchHandler.try_push_mt<void>(foo, id, result_holder_left);
}
if (result_holder_middle.evaluate_branch_checkIn()){
    branchHandler.try_push_mt<void>(foo, id, result_holder_middle);
}
if (result_holder_right.evaluate_branch_checkIn()){
    branchHandler.forward<void>(foo, id, result_holder_right);
}

if (dummyParent)
        delete dummyParent;

return;

} ```

As seen above, the HolderType instance wraps a lambda function, where the instantiation is delegated to it, and it must return a boolean. The purpose of this lambda function is to be able to tell the library if it is worth it to invoke a branch. Then, after the instantiation within the lambda function scope, this is verified. Since it is reading the most up-to-date values, now it is possible to use the custom verification to skip a branch or not. If it is worth it, then the HolderType instance holds the arguments as usual and the lambda function returns true. If it is not worth it, there is no need to hold arguments, and the lambda function returns false. Note that the lambda function captures by references, this is important if we implement this as a memory optimizer.

Since this lambda function wraps the branch verification condition, there is no need to write it again in the main scope, since it can be simply invoked by calling the method evaluate_branch_checkIn() as shown above.

This evaluate_branch_checkIn() method is also invoked internally in GemPBA so the DLB discards automatically a useless task, and skips to the next branch.


Important: In your main function (in which you initially call foo), you need to instantiate an instance of BranchHandler and call branchHandler.initThreadPool(numOfThreads) with your desired number of threads.



Multiprocessing

This is aimed for advanced users who might want to massively parallelize their applications. However, a user without parallelization experience would be able to set this up eaisly. Before continue reading, we encourage the reader to learn and undertand the difference between a process and a thread.

Multiprocessing can also be used within a single machine depending on the user's interests.

GemPBA uses openmpi to establish interprocess communication. In order to achieve better performance, a semi-centralized topology is the core of the communication. When launching your program with openmpi, a similar command as follows is used in the bash.

mpirun -n 10 --oversubscribe a.out args...

The -n tells openmpi the number of processes to be spawned. The keyword. --oversubscribe is usually included if all launching processes will be executed within a machine that does not have at least the same number of physical cores. a.out is the executable and args... stands for any other argument that the application receives before running.

When executing the aforementioned command, it will launch N copies of the program that will do exactly the same. The trick is to make them communicate. For this, the semi-centralized topology is implemented.

If the environment has been properly setup for multiprocessing, the center process (rank 0) will do the following steps:

  • initialize:
    • branch_handler();
    • mpi_scheduler();
  • read input data
  • build arguments that will be passed to the function that we want to parallelize.
  • serialize these arguments to create a taskpacket data buffer. ```gempba::taskpacket```
  • invoke the mpiScheduler.run_center(seed_packet) to pass the raw data buffer gempba::task_packet.


All other processes will do the following steps:

  • initialize:
    • branch_handler();
    • mpi_scheduler();
  • read input data
    • This is only necessary if all processes need a global copy of the initial data set. Otherwise, it can be avoided.
  • Initialize a buffer decoder. This instance will know the data types that the received buffer is going to be converted to.
  • Initialize the result_fetcher. This instance will fetch the result from the branch handler the process has ended all its tasks, and it will send it back to the corresponding process. This result fetcher is usually invoked when the center has notified termination to all the processes. However, it is aimed to be used for non-void functions, when this result must be returned to another process different from the center.
  • invoke mpiScheduler.run_node(branch_handler, buffer_decoder, result_fetcher, serializer)


These functions are synchronized such that no process ends until all of them have properly finished their duties.

After doing this, if the user wants to fetch the solution. It should invoke from the center process:

cpp gempba::task_packet buffer = mpiScheduler.fetch_solution();

Which is the best solution according to the user's criteria, stored in a serial fashion. This buffer must be deserialized in order to have the actual solution.

Thus a way to set up the main.cpp would go like this.

```cpp

include "mpi_scheduler.hpp"

include "branch_handler.hpp"

auto deserializer = { // - serialize arguments into stream ss // - the user can implement its favourite serialization technique };

auto serializer = { return // gempba::task_packet type };

int main(){ // parallel library local reference (BranchHandler is a singleton ) auto &branchHandler = gempba::branchhandler::getinstance(); // Interprocess communication handler local reference (mpischeduler is a singleton) auto &mpiScheduler = gempba::mpischeduler::getinstance(); /* gets the rank of the current process, so we know which process is the center */ int rank = mpiScheduler.rankme(); /* mpiScheduler is manually passed so the Branch Handler can invoke it to pass task to other processes */ branchHandler.passmpischeduler(&mpiScheduler);

// arguments initialization so each process knows their data types.
MyClass instance;
float f;
double d;

/*
*   initial data can be read in here by all processes, yet only the rank 0
*   would be able to use it
*/

/* set if the algorithm might be seen as a maximization or minimzation problem
the two available keyworsd are: minimize or maximize, not case sensitive.
If maximization, score or best value so far is INT_MIN (for I32) or the minimum for the supported types.
If minimization, score or best value so far is INT_MAX (for I32) or the maximum for the supported types.

By default, GemPBA maximizes, thus the below line is optional for maximization*/
branchHandler.set_goal(gempba::goal::MAXIMISE, gempba::score_type::I32);

// Here we set the best value so far if known somehow, optional
branchHandler.set_score(gempba::score::make(/* some primitive*/)); 



/* this necessary condition statement guarantees that each process will do its corresponding part
    in the execution, if this condition is not done, all processes will invoke the same methods,
    which will lead to unknown behaviour, and probably a simple deadlock in the most optimistic
    scenario */
if (rank == 0) {
    //   or initial data can be read in here by only the rank 0

    // input arguments are serialized and converted to a gempba::task_packet
    gempba::task_packet  seed_packet = serializer(instance, f, d);
    // center process receive only the raw buffer and its size
    mpiScheduler.run_center(seed_packet);
}
else
{
    /*  worker process
        main thread will take care of Inter-process communication (IPC), dedicated core
        numThreads could be the number of physical cores managed by this process - 1
    */
    // only workers need to initialized a pool
    branchHandler.init_thread_pool(numThreads);

    /*  Workers need to know the type of the incoming buffers, for this we invoke the method
        construct_buffer_decoder<returnType, Type1, Type2>(foo, deserializer)
        Note that the user only must pass the original argument types in the function ingoring 
        the thread ID and parent arguments. As seen above, this method receives the target function
        and a instance of the deserializer such that the library can decode this buffer and
        construct the arguments from the buffer and then forward them to the function.
    */
    auto bufferDecoder = branchHandler.construct_buffer_decoder<void, MyClass, float, double>(foo, deserializer);
    /* Workers are constantly searching for a solution, for which some processes might attain it
        and some others might not. This solution is also constantly replaced for a most promising one
        but it is sent to the center process only when the execution is terminated.

        This method is in charge of directly fetching the result from the Branch Handler as a
        gempba::result */
    auto resultFetcher = branchHandler.construct_result_fetcher();
    // Finally these instances are passed through the run_node(...) method
    mpiScheduler.run_node(branchHandler, bufferDecoder, resultFetcher, serializer);
}
/* this is a barrier just in case the user want to ensure that all processes reach this part 
    before continuing */
mpiScheduler.barrier();

/*
    The Branch Handler and the Mpi Scheduler offer certain information that can be fetched 
    that might be relevant for the user analysis of the parallelization.
    BranchHandler:  
        -   Idle time measured by thread pool
        -   Total number of task received by the thread pool
    Similarly

    MpiScheduler:
        -   Number of sent tasks by the process
        -   Number of received tasks by the process

    Since most of this information is private to each process, it can be gather invoking either
    the gather or allgather method in MpiScheduler, which follow the same rules of the MPI methods.

    gather if only one process will receive the data
    allgather if all process need the data
*/

// fetch the total number of participatin processes
int world_size = mpiScheduler.get_world_size();
/* creates a vector to fetch the idle time of each process
    it can also be a simple array 
    double idleTime[worldSize];
*/
std::vector<double> idleTime(world_size);
// actual idle time private to each process
double idl_tm = 0;  

// rank 0 does not instantiate a thread pool
if (rank != 0) { 
    idl_tm = branchHandler.get_pool_idle_time();
}

/* a pointer to the buffer is passed
    and idl_tm is copied to the buffer in its corresponding rank. Ie.
    if idl_tm is in process rank = 3, then it is going to be copied in all
    processes in the idleTime vector at position 3.
*/
mpiScheduler.allgather(idleTime.data(), &idl_tm, MPI_DOUBLE);

/* The user might want to print statistic or fetch the final solution, which is now stored 
    in the center process, another if statement is required.    */

if (rank == 0) { 
    mpiScheduler.print_stats();
    gempba::task_packet buffer = mpiScheduler.fetch_solution(); // returns a stringstream

    /* Solution data type must coincide with the data type passed through
    the method branch_handler::try_update_result(...) */
    SolType solution;
    std::stringstream ss;
    ss.write(reinterpret_cast<const char *>(packet.data()), static_cast<int>(packet.size())); // buffer passed to a stringstream   
    deserializer(ss, solution);

    // do something else with the solution
}        

} ```

As seen above, it is pretty simple to set the multi-processing environment since the difficult synchronization part is delagated to the GemPBA.

If we compare the aforementioned code vs a sequential version, we could notice that there are only a few modification. As shown below excluding the explaning comments and statistic information.

  • #### Sequential main.cpp

```cpp

int main(){ MyClass instance; float f; double d;

/*
*   initial data reading
*/

foo(instance, f,d);
// do something else with the solution

} ```



  • #### Multiprocessing main.cpp

```cpp

include "branch_handler.hpp"

include "mpi_scheduler.hpp"

auto deserializer = {/procedure/};

auto serializer = { /procedure/ return /* gempba::task_packet type */ };

int main(){ auto &branchHandler = gempba::branchhandler::getinstance(); auto &mpiScheduler = gempba::mpischeduler::getinstance(); int rank = mpiScheduler.rankme(); branchHandler.passmpi_scheduler(&mpiScheduler);

MyClass instance;
float f;
double d;

/*
*   initial data reading
*/

branchHandler.set_goal(gempba::goal::MINIMISE, gempba::I32); // or any other type
branchHandler.set_score(gempba::score::make(/* some primitive*/)); 
if (rank == 0) {
    gempba::task_packet seed_packet = serializer(instance, f, d);
    mpiScheduler.run_center(seed_packet);
}
else {
    branchHandler.init_thread_pool(numThreads);
    auto bufferDecoder = branchHandler.construct_buffer_decoder<void, MyClass, float, double>(foo, deserializer);
    auto resultFetcher = branchHandler.construct_result_fetcher();
    mpiScheduler.run_node(branchHandler, bufferDecoder, resultFetcher, serializer);
}
mpiScheduler.barrier();

if (rank == 0) { 
    gempba::task_packet buffer = mpiScheduler.fetch_solution(); 
    SolType solution;
    std::stringstream ss;
    ss.write(reinterpret_cast<const char *>(packet.data()), static_cast<int>(packet.size()));
    deserializer(ss, solution);

    // do something with "solution"
}

} ```



Hence, the code modifications to convert the Multithreading function to Multiprocessing are minors. As presented below.


```cpp std::mutex mtx; auto &dlb = gempba::DLBHandler::getInstance(); auto &branchHandler = gempba::branchhandler::get_instance(); using HolderType = gempba::ResultHolder;

void foo(int id, MyClass instance, float f, double d, void *parent = nullptr)

if (localSolution < branchHandler.get_score().get<int>()){
    std::scoped_lock<std::mutex> lock(mtx);
    gempba::score score = gempba::score::make(localSolution);
    branchHandler.try_update_result(localSolution, score, serializer);

    return;
}

HolderType *dummyParent = nullptr;
HolderType result_holder_left(dlb, id, parent);
HolderType result_holder_middle(dlb, id, parent);
HolderType result_holder_right(dlb, id, parent);

if (branchHandler.get_load_balancing_strategy() == gempba::QUASI_HORIZONTAL){
    dummyParent = new HolderType(dlb, id);
    dlb.linkVirtualRoot(id, dummyParent, result_holder_left, result_holder_middle, result_holder_right);
}


result_holder_left.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_l, f_l, d_l */
                                  if (/* left branch leads to a better solution */){
                                      result_holder_left.holdArgs(instance_l, f_l, d_l);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

result_holder_middle.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_m, f_m, d_m */
                                  if (/* middle branch leads to a better solution */){
                                      result_holder_middle.holdArgs(instance_m, f_m, d_m);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

result_holder_right.bind_branch_checkIn([&]{
                                  /* arguments intantiation instance_r, f_r, d_r */
                                  if (/* right branch leads to a better solution */){
                                      result_holder_right.holdArgs(instance_r, f_r, d_r);
                                      return true;
                                  }
                                  else { return false;}                                          
                              });

if (result_holder_left.evaluate_branch_checkIn()){
    branchHandler.try_push_mp<void>(foo, id, result_holder_left, serializer);
}
if (result_holder_middle.evaluate_branch_checkIn()){
    branchHandler.try_push_mp<void>(foo, id, result_holder_middle, serializer);
}
if (result_holder_right.evaluate_branch_checkIn()){
    branchHandler.forward<void>(foo, id, result_holder_right);
}

if (dummyParent)
        delete dummyParent;

return;

} ```

As you may have noticed, within the termination condition, the solution is captured in a serial fashion. This solution is stored in a gempba::result data type, where the integer is the score associated with the solution. This score is used by the mpi_scheduler when retrieving the global solution by the means of comparing the best score so far against the solutions attained by each process.

As for the parallel calls, the method branch_handler::try_push_mt(...) is changed to branch_handler::try_push_mp(..., serializer).

This mt suffix stands for Multithreading whereas the mp suffix stands for Multiprocessing.

Internally, try_push_mp will invoke the mpi_scheduler to ascertain for any available processor, if none, then it will invoke try_push_mt for a local thread.

try_push_mt and try_push_mp return true if the asynchronous operation was succeeding, otherwise, it will continue sequentially and when it returns, it will be false.

Multiprocessing with Centralized Scheduler (Optional)

The GemPBA library includes an optional centralized scheduler for comparison purposes only. It can be used by calling gempba::mpi_centralized_scheduler::get_instance(), see mp_bitvect_opt_enc_central.cpp or mp_bitvect_basic_enc_central.cpp for an example of how to use it.

Note: The centralized scheduler is not part of the project's scope, but it is mentioned here for completeness. Depending on your project structure, you might need to add additional imports due to hidden dependencies within the GemPBA library.

Starring the project

If you find this project helpful, Id greatly appreciate it if you could give it a star on GitHub! This helps me gauge how many people are benefiting from the code and encourages me to continue enhancing and maintaining it.

Copyright and citing

Copyright 2021-2025 Andrs Pastrana. Licensed under the MIT license.

If you use this library in software of any kind, please provide a link to the GitHub repository in the source code and documentation.

If you use this library in published research, please cite it as follows:

For BibTeX users, please use REFERENCES.bib for citation.

bibtex @article{PASTRANACRUZ2023103024, archiveprefix = {arXiv}, title = {A lightweight semi-centralized strategy for the massive parallelization of branching algorithms}, journal = {Parallel Computing}, volume = {116}, pages = {103024}, year = {2023}, issn = {0167-8191}, doi = {https://doi.org/10.1016/j.parco.2023.103024}, url = {https://www.sciencedirect.com/science/article/pii/S0167819123000303}, author = {Andres Pastrana-Cruz and Manuel Lafond}, keywords = {Load balancing, Vertex cover, Parallel algorithms, Scalable parallelism, Branching algorithms} }

The publications available on Parallel Computing and arXiv do not reflect the most recent updates to the library. These papers are primarily aimed at helping researchers discover the library and providing a reference for citing it in academic work. For the latest documentation, please refer to the README.md file in the GitHub repository.

Owner

  • Name: Andres P
  • Login: rapastranac
  • Kind: user
  • Location: Canada

M.Sc., Computer Science

Citation (CITATION.bib)

@article{PASTRANACRUZ2023103024,
    archiveprefix = {arXiv},
    title = {A lightweight semi-centralized strategy for the massive parallelization of branching algorithms},
    journal = {Parallel Computing},
    volume = {116},
    pages = {103024},
    year = {2023},
    issn = {0167-8191},
    doi = {https://doi.org/10.1016/j.parco.2023.103024},
    url = {https://www.sciencedirect.com/science/article/pii/S0167819123000303},
    author = {Andres Pastrana-Cruz and Manuel Lafond},
    keywords = {Load balancing, Vertex cover, Parallel algorithms, Scalable parallelism, Branching algorithms}
}

GitHub Events

Total
  • Create event: 15
  • Release event: 2
  • Issues event: 13
  • Watch event: 3
  • Delete event: 15
  • Push event: 43
  • Pull request event: 24
Last Year
  • Create event: 15
  • Release event: 2
  • Issues event: 13
  • Watch event: 3
  • Delete event: 15
  • Push event: 43
  • Pull request event: 24

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 25 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 11
  • Average time to close issues: N/A
  • Average time to close pull requests: 13 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rapastranac (12)
  • Manuel-GithubAccount (2)
Pull Request Authors
  • rapastranac (18)
  • Manuel-GithubAccount (2)
Top Labels
Issue Labels
enhancement (4) bug (2)
Pull Request Labels
enhancement (7) bug (2) version (2) Refactor (1) documentation (1)